How to retrieve video dimensions with Lucee/ColdFusion

Peter · November 24, 2021, 12:00am

The problem:
I recently needed to display a video and scale it using the <video> tag. The problem was that a search of the Internet failed to find how to determine the dimensions (width and height) of the source video that I needed to scale and preserve aspect ratio.

A solution:
Use Apache Tika https://tika.apache.org/ that can detect and extract metadata and text from over a thousand different file types.

The method:

Install Apache Tika in Lucee
Lucee already bundles the Apache Tika core. However, as it is a façade for the API, many calls to it return nothing with the result that you need to use the full version of Apache Tika.

Download Tika from https://tika.apache.org/download.html I am using tika-app-2.1.0.jar but other versions should work. Lucee requires that third-party JAR files be OSGi compliant and that is achieved by editing the JAR file using WinRar or 7-Zip and adding these lines to the internal META-INF/MANIFEST.MF file then save back into the JAR file.

Bundle-Name: Apache Tika App Bundle
Bundle-SymbolicName: apache-tika-app-bundle
Bundle-Description: Apache Tika App jar converted to an OSGi bundle
Bundle-ManifestVersion: 2
Bundle-Version: 2.1.0

Place the tika-app-2.1.0.jar file into the C:\Lucee\tomcat\lucee-server\bundles folder (as applicable to your installation). When browsing http://127.0.0.1:8888/lucee/admin/server.cfm?action=info.bundle you should see the apache-tika-app-bundle as “not loaded”.

Add this function to your CFML code to read and retrieve text and metadata from a file.

<cfscript>
	function getFileContent(filename) {
		var LOCAL = StructNew() ;
		LOCAL.result = StructNew() ;
		LOCAL.result.error = "" ;
		LOCAL.result.text = "" ;
		if (FileExists(filename)) {
			LOCAL.f = createObject("java", "java.io.File").init(filename);
			LOCAL.fis = createObject("java","java.io.FileInputStream").init(LOCAL.f);
			LOCAL.ch = CreateObject("java","org.apache.tika.sax.BodyContentHandler", "apache-tika-app-bundle");
			LOCAL.parser = CreateObject("java","org.apache.tika.parser.AutoDetectParser", "apache-tika-app-bundle");
			LOCAL.md = CreateObject("java","org.apache.tika.metadata.Metadata", "apache-tika-app-bundle");
			try {
				LOCAL.parser.parse(LOCAL.fis, LOCAL.ch, LOCAL.md);
				LOCAL.keys = LOCAL.md.names();
				LOCAL.result.metadata = StructNew();
				for (var ii = 1; ii lt arrayLen(LOCAL.keys); ii = ii + 1) {
					LOCAL.mdval = LOCAL.md.get(LOCAL.keys[ii]);
					if (not isNull(LOCAL.mdval)) {
						LOCAL.result.metadata[LOCAL.keys[ii]] = LOCAL.mdval;
					}
				}
				LOCAL.result.text = LOCAL.ch.toString();
			} catch (any e) {
				LOCAL.result.error = e;
			}
			LOCAL.fis.close();
		} else {
			LOCAL.result.error = "File not found" ;
		}
		return LOCAL.result;
	}
</cfscript>

To extract text and metadata from an existing file or a file that has been recently uploaded, use code similar to that below adapting to your circumstances.

<!--- create a list to filter the metadata --->  
<cfset RequiredMetaData = "Content-Type,Date/Time,Exif SubIFD:Date/Time Original,Make,Model,tiff:Make,tiff:Model,Lens,F-Number,Focal Length,Focal Length 35,Exif SubIFD:Exposure Time,Exif SubIFD:F-Number,Exif SubIFD:Focal Length,Author,title,Comments,Exif SubIFD:Lens Specification,Copyright,Shutter Speed Value,GPS Latitude,GPS Longitude,GPS:GPS Latitude,geo:lat,GPS:GPS Longitude,geo:long,GPS Img Direction,Flash,Orientation,ISO Speed Ratings,Exposure Time,Image Height,tiff:ImageLength,Image Width,tiff:ImageWidth">

<!--- get file content --->
<cfset FileContent = getFileContent("#fullPathToFileName#")>

<!--- get the file text content --->
<cfif IsDefined("FileContent.text")>
	<cfset contentText = Trim(FileContent.text)>
<cfelse>
	<cfset contentText = "">
</cfif>

<!--- get the file metadata as a comma delimited string--->
<cfset fileMetaData = "">
<cfset videoWidth = 0>
<cfset videoHeight = 0

<cfif IsDefined("FileContent.metadata")>
	<cfset qrymeta = QueryNew("Property,PropertyValue,SortOrder")>
	<cfset ii = 0>
	<cfloop collection=#FileContent.metadata# item="prop">
		<cfif ListFindNoCase(variables.RequiredMetaData, variables.prop) gt 0>
			<cfset ii = variables.ii + 1>
			<cfset QueryAddRow(qrymeta, 1)>
			<cfset temp = QuerySetCell(qrymeta, "Property", "#ReplaceNoCase(ReplaceNoCase(ReplaceNoCase(variables.prop,'tiff:','','all'),'GPS:GPS','','all'),'Exif SubIFD:','','all')#", variables.ii)>
			<cfset temp = QuerySetCell(qrymeta, "PropertyValue", "#FileContent.metadata[variables.prop]#", variables.ii)>
			<cfset temp = QuerySetCell(qrymeta, "SortOrder", "#FindNoCase(prop, variables.RequiredMetaData)#", variables.ii)>
		</cfif>
	</cfloop>
	<cfquery dbtype="query" name="qrymeta2">
		SELECT 	 Property,
				 PropertyValue
		FROM	 qrymeta
		ORDER BY SortOrder
	</cfquery>
	<cfloop query="qrymeta2">
		<cfset fileMetaData = variables.fileMetaData & qrymeta2.Property & " = " & qrymeta2.PropertyValue & ", ">
		<!--- The Content-Type metadata isn't returned for videos so identify by the file extension --->
        <cfif variables.fileextension eq "MP4" or variables.fileextension eq "M4V">
			<cfif qrymeta2.Property eq "ImageWidth">
				<cfset videoWidth = val(qrymeta2.PropertyValue)>
			</cfif>
			<cfif qrymeta2.Property eq "ImageLength">
				<cfset videoHeight = val(qrymeta2.PropertyValue)>
			</cfif>
		</cfif>
	</cfloop>
</cfif>

There are other metadata values that can be retrieved so a <cfdump var="#FileContent#"> will reveal the metadata for the selected file type.

Extracting the text content from documents such as PDF and Word will then allow you to save that text in a database and in conjunction with SQL Server Full-Text Search you could create a Google-like document search with results ranking and content snippets.