Extract Text with CFPDF?

The way I am reading the documentation for CFPDF, there is an option to extract text from a PDF. Example:

<cfpdf source="mypdf" action="extracttext" type="xml" name="diditwork"></cfpdf>

But that results in an error message:

not supported yet, see https://issues.jboss.org/browse/LUCEE-1559

Am I missing something?

This issue has been around since Railo and our workaround has been to use the PDFBox java library’s text extraction directly.

But it looks as if it’s finally been addressed in Lucee 5.3.5.75 and PDF Extension 1.0.0.78. See this ticket: https://luceeserver.atlassian.net/browse/LDEV-1941

OK, I’ve upgraded to Lucee 5.3.5.92 and PDF 1.0.0.80.

I wasn’t getting any data until I exported the PDF as a flat PDF on my laptop. And I tried to flatten it first with Lucee, but that did not work.

<cfpdf source="mypdf" action="extracttext" type="xml" name="diditwork" info=#showme#></cfpdf>

Does show the text.

Now, the stupid questions start!! Like, why can’t the output be structured? It’s just one long XML data field.

File a bug with a sample pdf, link it back to the above issue

Take a look at Matt Clemente’s cfc wrapper for PDFBox