Indexing RTF Files

Tomalak · February 4, 2019, 5:49pm

I can see that Apache Tika is included with Lucee 5.2, which can presumably extract the contents of RTF documents, among other file formats. In my case there’s a org.apache.tika.core-1.10.0.jar in the bundles directory.

However, it does not seem to get used - or it doesn’t work the way I expect. After indexing a few RTF documents with <cfindex type="file">, I can search them with <cfsearch>, but the “Summary” in the search result always contains raw RTF markup:

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch31506\ ...

Is this expected behavior? Shouldn’t I get the plain text contents of the file, i.e. something that can actually be shown to the user?