I can see that Apache Tika is included with Lucee 5.2, which can presumably extract the contents of RTF documents, among other file formats. In my case there’s a org.apache.tika.core-1.10.0.jar
in the bundles directory.
However, it does not seem to get used - or it doesn’t work the way I expect. After indexing a few RTF documents with <cfindex type="file">
, I can search them with <cfsearch>
, but the “Summary” in the search result always contains raw RTF markup:
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch31506\ ...
Is this expected behavior? Shouldn’t I get the plain text contents of the file, i.e. something that can actually be shown to the user?