PDF to JPG problem

OS: Windows Server 2012
Java Version: 11.0.6
Tomcat Version: Apache Tomcat/9.0.31
Lucee Version: Lucee 5.3.8.3-SNAPSHOT

Hey,

I’m currently trying to make a thumbnail of a PDF file. I noticed that the cfpdf tag doesn’t work well, and thus did not generate a image file. Now I’m using code I found here: link but I’m getting an error:

No matching Method for writeImage(org.apache.pdfbox.pdmodel.PDDocument, string, string, numeric, numeric, string, numeric, numeric) found for org.apache.pdfbox.util.PDFImageWriter

this is strange because when I writedump the function itself I get:

boolean writeImage(org.apache.pdfbox.pdmodel.PDDocument, java.lang.String, java.lang.String, int, int, java.lang.String, int, int)

I’m calling it like this:

imageWriter.writeImage(document, JavaCast(“string”,“jpg”), JavaCast(“string”,“”), arguments.page, arguments.page, JavaCast(“string”,returnfileprefix), BufferedImage.TYPE_INT_RGB, 200);

or without the last two arguments. Am I doing something wrong? I have the feeling the parameters are not correct but I tried a lot of things and still the same error. Some help would be appreciated!

Thanks!

I’m not sure what the issue is but it could be to do with the version of PDF Box you’re using, which I assume is the one Lucee loads? i.e. you’re not loading it separately?

For my own thumbnail workaround I’m using pdfbox-app-2.0.19 and the PDFRenderer class. This isn’t exactly how I’m doing it, but should give you an idea:

pdfBoxJarPath = ExpandPath( "/library/pdfBox/2_0/pdfbox-app-2.0.19.jar" );
// create a test pdf (or supply an existing path)
pdfPath = ExpandPath( "thumbtest.pdf" );
document overwrite="true" format="pdf" filename=pdfPath{ WriteOutput( "Bonjour Monde" ) };
try{
	pdfFile = CreateObject( "java", "java.io.FileInputStream" ).init( JavaCast( "string", pdfPath ) );
	reader = CreateObject( "java", "org.apache.pdfbox.pdmodel.PDDocument", pdfBoxJarPath );
	pdf = reader.load( pdfFile );
	pageNumber = 0; //use the first page
	renderer = CreateObject( "java", "org.apache.pdfbox.rendering.PDFRenderer", pdfBoxJarPath ).init( pdf );
	imageType = CreateObject( "java", "org.apache.pdfbox.rendering.ImageType", pdfBoxJarPath )[ "RGB" ];
	dpi = 72;
	bufferedImage = renderer.renderImageWithDPI( JavaCast( "int", pageNumber ), JavaCast( "float", dpi ), imageType );
	// create, resize and write out a CFML image object
	image = ImageNew( bufferedImage );
	ImageScaleToFit( image, 100, "", "highestQuality" );
	ImageWrite( image, ExpandPath( "thumb.jpg" ) );
}
finally{
	pdfFile.close();
	pdf.close();
}
1 Like

If you are using PDFBox 2.x.x, that example code will no longer work as the PDFImageWriter class was removed with that release. This is noted here: Apache PDFBox | PDFBox 2.0.0 Migration Guide under the PDF Rendering section.

1 Like

Thanks, Tony. I knew there must have been a reason why I switched to PDFRenderer :slightly_smiling_face:

1 Like

Thanks for providing an example as I did not have one handy other than saying it won’t work :sweat_smile:

This creates a png image from a pdf…

<cffunction name="PDFpageToImage" output="false">
    
  <cfargument name="filebytes" required="true">
  <cfargument name="pageNo" default="0">
  <cfargument name="dpi" default="72">
  
  <cfscript>
    /* creates a 72dpi base64 png image of pdf page input as bytes (could also pass in filename)
	   for byteArray return, do return after baos.close() and doc.close()
	   'org.apache.pdfbox.app' is in bundles directory :)
	*/
    var PDDocument = CreateObject("java", "org.apache.pdfbox.pdmodel.PDDocument", "org.apache.pdfbox.app", "2.0.18");
    var PDFRenderer = CreateObject("java", "org.apache.pdfbox.rendering.PDFRenderer", "org.apache.pdfbox.app", "2.0.18");
	var ImageIO = CreateObject("java", "javax.imageio.ImageIO");
	
	
	var doc = PDDocument.init().load(arguments.filebytes);
    //var doc = PDDocument.init().load(arguments.filename);
       
	var renderer = PDFRenderer.init(doc);
	
	var image = renderer.renderImageWithDPI(arguments.pageNo, arguments.dpi);
		
	var baos = createObject("java","java.io.ByteArrayOutputStream").init();
	
	ImageIO.write(image, "PNG", baos);
	
	baos.flush();
	
	var imageInBytes = baos.toByteArray();
	
	baos.close();
	
	doc.close();
		
	var Base64 = CreateObject("java", "java.util.Base64");
		
	var b64 = Base64.getEncoder().encodeToString(imageInBytes);
			
	return b64;
	
	//output: 'res' is the returned base64 (b64)
	//dataurl = "data:image/png;base64,#res#"; see pagetoimage.cfm
	
  </cfscript>
  
</cffunction>
2 Likes

Thanks guys, yes we have multiple PDF jar files so that’s probably it. Thanks for the code examples for clearing it up!

That’s why the bundles directory is awesome! You can have multiple versions of a jar file and they are all separate. (I mention this in case anyone has not tried the bundles directory).

When I run this script I get OSGi Bundle org.apache.pdfbox.app is not available…

I look in Lucee admin under Bundles and it is there but says not loaded. I have searched every where and cannot find how to load it? Any help is appreciated.

@paulwhita Welcome!!!

Try to load the OSGI ready jar similar as shown here

Thank you for the quick response. It did help me figure out I was trying to load the wrong version.

I am trying to run the script above from @Andrew2. On line 16:

var doc = PDDocument.init().load(arguments.filebytes);

I get:

No matching Method/Function for org.apache.pdfbox.pdmodel.PDDocument.load(string) found

Any idea? Not much of a java guy.

So am I!!

Can you dump the object and the instantiated object?

dump(PDDocument);

and

dump(PDDocument.init());
1 Like

What type of data is arguments.filebytes? Is it binary data (byte array) that represents a file? Based on the error message you shared, it seems to think you’re passing a string value. The load method can take a byte array, Java File, or Java InputStream.

This may not be your situation, but Lucee just upped their PDFBox version to the v3 release candidate, which no longer uses the .load method. It uses loaders. This post and the example he gave helped me figure it out: java - Apache PDFBox cannot find class 'Loader'. Why? - Stack Overflow

This was how I modified @Andrew2 's code:

var PDFRenderer = CreateObject(“java”, “org.apache.pdfbox.rendering.PDFRenderer”);
var ImageIO = CreateObject(“java”, “javax.imageio.ImageIO”);
var Loader = CreateObject(“java”, “org.apache.pdfbox.Loader”);

//load PDF
var file = CreateObject( “java”, “java.io.File” ).init( JavaCast( “string”, arguments.filename ) );
var doc = Loader.LoadPDF(file);

// render pdf
var renderer = PDFRenderer.init(doc);

2 Likes

Building on @mightyflea’s last post, here is the updated function in full working on Lucee 5.3.10.120:—

<cffunction name="PDFpageToImage" output="false"> <!--- https://lucee.daemonite.io/t/pdf-to-jpg-problem/7417/5 --->
    
  <cfargument name="filename" required="true">
  <cfargument name="pageNo" default="0">
  <cfargument name="dpi" default="72">
  
  <cfscript>
    /* creates a 72dpi base64 png image of pdf page input as bytes (could also pass in filename)
	   for byteArray return, do return after baos.close() and doc.close()
	   'org.apache.pdfbox.app' is in bundles directory :)
	*/
	
	PDFRenderer = CreateObject("java", "org.apache.pdfbox.rendering.PDFRenderer");
	ImageIO = CreateObject("java", "javax.imageio.ImageIO");
	Loader = CreateObject("java", "org.apache.pdfbox.Loader");
	
	//load PDF
	file = CreateObject( "java", "java.io.File" ).init( JavaCast( "string", expandPath(arguments.filename) ) );
	doc = Loader.LoadPDF(file);
	
	// render pdf
	renderer = PDFRenderer.init(doc);
       
	var renderer = PDFRenderer.init(doc);
	
	var image = renderer.renderImageWithDPI(arguments.pageNo, arguments.dpi);
		
	var baos = createObject("java","java.io.ByteArrayOutputStream").init();
	
	ImageIO.write(image, "PNG", baos);
	
	baos.flush();
	
	var imageInBytes = baos.toByteArray();
	
	baos.close();
	
	doc.close();
		
	var Base64 = CreateObject("java", "java.util.Base64");
		
	var b64 = Base64.getEncoder().encodeToString(imageInBytes);
			
	return b64;
	
	//output: 'res' is the returned base64 (b64)
	//dataurl = "data:image/png;base64,#res#"; see pagetoimage.cfm
	
  </cfscript>
  
</cffunction>

We did fix the CFPDF thumbnail support in 1.1.0.17-SNAPSHOT, so try PDF 1.1.0.19 ?

https://luceeserver.atlassian.net/browse/LDEV-967

1 Like

Built-in image generation via ‘thumbnail’ is great, but the max resolution is not enough for all use cases. Here’s a working code snippet to squeeze out max quality available via the thumbnail generation route for anyone who stumbles here:—

	cfpdf(
    	action="thumbnail",
    	source=expandPath( "sample.pdf" ),
    	destination=expandPath( "" ),
    	resolution="high",
    	format="png",
    	overwrite="yes",
    	pages="1",
    	scale="100",
    	transparent="no",
    	imageprefix=""
	);
3 Likes

Very nice, thank you for sharing!

I could be mistaken (depending on your settings) - I believe there may be a couple unscoped vars and a duplicate renderer in the snippet.
Below are proposed fixes (where unexpected results may occur under load).

1 Like

Good thinking! Here is a new version improved by AI:—

<cffunction name="convertPdfPageToImage" output="false" returntype="string" hint="Converts a specified PDF page to a base64-encoded PNG image.">
  <cfargument name="filename" type="string" required="true" hint="The path to the PDF file.">
  <cfargument name="pageNo" type="numeric" default="0" hint="The page number to convert.">
  <cfargument name="dpi" type="numeric" default="72" hint="The DPI (dots per inch) of the output image.">
  
  <cfscript>
    var PDFRenderer = createObject("java", "org.apache.pdfbox.rendering.PDFRenderer");
    var ImageIO = createObject("java", "javax.imageio.ImageIO");
    var Loader = createObject("java", "org.apache.pdfbox.Loader");
    
    var file = createObject("java", "java.io.File").init(expandPath(arguments.filename));
    
    if (!file.exists()) {
      throw("The specified file does not exist.");
    }
    
    var doc = Loader.LoadPDF(file);
    
    if (arguments.pageNo lt 0 or arguments.pageNo gte doc.getNumberOfPages()) {
      throw("The specified page number is out of range.");
    }
    
    var renderer = PDFRenderer.init(doc);
    var image = renderer.renderImageWithDPI(arguments.pageNo, arguments.dpi);
    var baos = createObject("java", "java.io.ByteArrayOutputStream").init();
    
    ImageIO.write(image, "PNG", baos);
    baos.flush();
    
    var imageInBytes = baos.toByteArray();
    baos.close();
    doc.close();
    
    var b64 = binaryEncode(imageInBytes, "base64");
    
    return b64;
  </cfscript>
</cffunction>
2 Likes