Lucee 6.1 Java Performance (POI Excel Files)

OS : Windows 10 Enterprise 64bit
Java Version : 11.0.25 (Eclipse Adoptium) 64bit
Lucee Version : 6.1.1.118

We use the Apache POI library to build excel files and noticed quite a bit of performance drop when using lucee 6.1.x. We’re using the streaming SXSSF workbook, as is recommended for large files.

The following example is running locally using Commandbox. I created a fake dataset with about 48k rows with 17 columns (this mirrors the data set size we’re seeing in our production servers that gave us notice to the issue).

In Lucee 6.0.4.10, it is averaging around 0.12 seconds per 1000 rows on my machine. The whole 48k dataset takes about 6-7 seconds total.

Updating to Lucee 6.1.1.118, the same code averages 1.5 seconds per 1000 rows, with a total time around 70-75 seconds.

I know a lot of work has been done to how Lucee is interacting with Java objects in Lucee 6. But the performance hit for this case is pretty severe. I’m wondering if there is something we should be doing differently?

I’ve included sample code to reproduce the issue. It requires the cfspreadsheet extension to be installed.

qTimes = queryNew('Desc,secElapsed,secCummulative', 'varchar,decimal,decimal');
qTimes.addRow({ desc = 'Starting: Lucee Version #SERVER.lucee.version#', secElapsed = 0, secCummulative = 0 });
row = 1;
qData = queryNew(
	'col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11,col_12,col_13,col_14,col_15,col_16,col_17'
	,'integer,varchar,integer,date,integer,decimal,varchar,varchar,varchar,varchar,integer,date,integer,decimal,varchar,varchar,varchar'
);
for(i = 1; i <= 47008; i++ ) {
	// Just filling query with random test data
	qData.addRow({
		col_1 = row++
		,col_2 = 'foo'
		,col_3 = randRange(1, 100)
		,col_4 = dateAdd('d', i-1, createDate(2024,1,1))
		,col_5 = i % 3
		,col_6 = i * (pi()-2)
		,col_7 = 'foo'
		,col_8 = 'bar'
		,col_9 = 'baz'
		,col_10 = 'foo'
		,col_11 = randRange(1, 10000)
		,col_12 = dateAdd('d', -1*(i-1), createDate(2024,1,1))
		,col_13 = i % 3
		,col_14 = i * (pi()-2)
		,col_15 = 'foo'
		,col_16 = 'bar'
		,col_17 = 'baz'
	});
}



tick = GetTickCount();
writeLog( 'Creating worksheet' );
wb = createObject('java', 'org.apache.poi.xssf.streaming.SXSSFWorkbook', 'cfspreadsheet' );
sheet = wb.createSheet('Example');

// Create reusable styles headerStyle, numberStyle, integerStyle, dateStyle
headerStyle = wb.createCellStyle();
headerStyle.setFillPattern(headerStyle.SOLID_FOREGROUND);
headerStyle.setFillForegroundColor(createObject("java","org.apache.poi.hssf.util.HSSFColor$GREY_25_PERCENT").getIndex());
headerFont = wb.createFont();
headerFont.setBoldweight(headerFont.BOLDWEIGHT_BOLD);
headerStyle.setFont(headerFont);

numberStyle = wb.createCellStyle();
numberStyle.setAlignment( numberStyle.ALIGN_RIGHT )
numberStyle.setDataFormat( wb.createDataFormat().getFormat( "##,####0.00" ) );

integerStyle = wb.createCellStyle();
integerStyle.setAlignment( integerStyle.ALIGN_RIGHT )
integerStyle.setDataFormat( wb.createDataFormat().getFormat( "##,####0" ) );

dateStyle = wb.createCellStyle();
dateStyle.setAlignment( numberStyle.ALIGN_RIGHT )
dateStyle.setDataFormat( wb.createDataFormat().getFormat( "m/d/yyyy" ) );


writeLog( 'output headers - #(getTickCount()-tick)/1000#s' );

// Create Header Rows
currentRowNum = 0;
row = sheet.createRow(currentRowNum++);
col = 0;
for( label in qData.columnArray() ) {
	cell = row.createCell(col++); cell.setCellStyle(headerStyle); cell.setCellValue("Column #ucase(label)#");
}
secElapsed = (getTickCount()-tick)/1000;
qTimes.addRow({ desc = 'Headers', secElapsed = secElapsed, secCummulative = secElapsed });

// Loop over query data
writeLog( 'Start output data - #(getTickCount()-tick)/1000#s' );
prevTick = getTickCount();
cfloop( query = qData ) {
	if (qData.currentRow % 1000 == 0 ) {
		secElapsed = (getTickCount()-prevTick)/1000;
		secCummulative = (getTickCount()-tick)/1000;
		writeLog( 'output data #qData.currentRow# - #secCummulative#s - #secElapsed#s' );
		qTimes.addRow({ desc = 'Row #qData.currentRow#', secElapsed   = secElapsed, secCummulative = secCummulative});
		prevTick = getTickCount();
	}
	col = 0;
	row = sheet.createRow(currentRowNum++);
	cell = row.createCell(col++); cell.setCellValue(qData['col_1']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_2']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_3']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_4']); cell.setCellStyle(dateStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_5']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_6']); cell.setCellStyle(numberStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_7']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_8']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_9']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_10']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_11']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_12']); cell.setCellStyle(dateStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_13']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_14']); cell.setCellStyle(numberStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_15']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_16']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_17']);
}


// Output to file
baos = createObject("java", "java.io.ByteArrayOutputStream").init();
wb.write(baos);
baos.flush();
fileWrite('C:\temp\report.xlsx', baos.toByteArray());

// Display times
writeDump( var="#qTimes#", output="browser", abort="true");

results from 6.1:

Results from 6.0:

Appreciate any guidance on this.

I can replicate your results and have experienced similar issues with Lucee 6.1 when trying to run the spreadsheet-cfml test suite.

The good news is that it all seems fine again in Lucee 6.2 :slight_smile:, which is currently a release candidate.

Incidentally, did you know that the spreadsheet-cfml library allows you to generate SXSSF files with far fewer lines of code?

It also runs on the latest Apache POI release (currently 5.4.0).

If you prefer handling the POI objects directly then there’s a createJavaObject method which will load them for you. I used that when testing your issue to avoid having to install the cfspreadsheet extension.

3 Likes

Thanks for this. I can confirm that it is MUCH faster in the latest 6.2 snapshot as well. Running the code above, I average about 0.05 seconds per 1000 rows, with a total run time under 4 seconds for the full data set.

I had looked at using spreadsheet-cfml package, especially since it’s using the latest 5.4.0 POI version. If I use the cfml components to generate the excel sheet, it is significantly slower, so I had stopped going deeper. In Lucee 6.1, it’s averaging 4.5 seconds per 1000 rows using the newStreamingXlsx worksheet. Switching to Lucee 6.2 snapshot, that improves to 0.8 seconds per 1000.

I really like the easy methods with spreadsheet-cfml but I think ultimately we’ll use the java classes loaded like you suggested, via createJavaObject. When I reexamined it, it was the same performance and gives the benefit of sunsetting cfspreadsheet and its POI version, as that plugin has gone dormant. We were also getting vulnerability warnings from some of those old packages, so it’s a win-win. We mainly used cfspreadsheet since we have a few places that used the ACF functions as we just migrated to Lucee over the last couple years.

Is there an estimated timeline for 6.2 release?

Appreciate the help!

1 Like

Yes, I ran a comparison too and the library is much slower I have to admit. I’m looking at ways of improving performance for the next release, but your direct code will probably always be faster because there’s a lot of extra work the library has to do.

Glad you’re able to make use of it anyway via createJavaObject().

I don’t know when Lucee 6.2 is due but I guess it won’t be long.

One caveat with the DIY approach: if a new version of POI or its dependencies changes the way things work (and it happens quite frequently, sometimes between major versions), then you’ll be responsible for adjusting your code when you upgrade the library.

I try and document such changes as they come up so you should find the library’s issue tracker useful.

1 Like

You could also go for a “hybrid” approach where you break it down into steps, let the library do some of them, and just write custom code where performance is critical.

The benefits include:

  • Less code overall
  • Less code you’re responsible for keeping up with POI
  • The library will handle things like SXLSX clean-up for you (getting rid of temp files used in streaming).

I’ve adapted your test code (below) with this approach and on Lucee 6.2 I’m seeing total execution times under 4 seconds, and average times of 0.06 per 1000 rows.

Using just the library (no custom code) takes around 22 seconds total, which isn’t great in comparison but the trade off may be worth it in ease of coding and maintenance.

I’ve looked into where the bottlenecks are with the library and it’s mainly down to date value handling of various kinds. Removing that would break things completely so unfortunately not an option, but I have managed some improvements in the latest release (4.3.1).

spreadsheet = New spreadsheet.Spreadsheet();
//Init performance measurement data
qTimes = queryNew('Desc,secElapsed,secCummulative', 'varchar,decimal,decimal');
qTimes.addRow({ desc = 'Starting: Lucee Version #SERVER.lucee.version#', secElapsed = 0, secCummulative = 0 });
row = 1;
// Create test data
qData = queryNew(
	'col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11,col_12,col_13,col_14,col_15,col_16,col_17'
	,'integer,varchar,integer,date,integer,decimal,varchar,varchar,varchar,varchar,integer,date,integer,decimal,varchar,varchar,varchar'
);
for(i = 1; i <= 47008; i++ ) {
	qData.addRow({
		col_1 = row++
		,col_2 = 'foo'
		,col_3 = randRange(1, 100)
		,col_4 = dateAdd('d', i-1, createDate(2024,1,1))
		,col_5 = i % 3
		,col_6 = i * (pi()-2)
		,col_7 = 'foo'
		,col_8 = 'bar'
		,col_9 = 'baz'
		,col_10 = 'foo'
		,col_11 = randRange(1, 10000)
		,col_12 = dateAdd('d', -1*(i-1), createDate(2024,1,1))
		,col_13 = i % 3
		,col_14 = i * (pi()-2)
		,col_15 = 'foo'
		,col_16 = 'bar'
		,col_17 = 'baz'
	});
}
// Start execution time measurement
tick = GetTickCount()
// Create worksheet
wb = spreadsheet.newStreamingXlsx( 'Example' );
sheet = wb.getSheetAt( 0 );
//Create reusable styles headerStyle, numberStyle, integerStyle, dateStyle
headerFormat = { fillPattern: "SOLID_FOREGROUND", fgColor: "GREY_25_PERCENT", bold: true };
headerStyle = spreadsheet.createCellStyle( wb, headerFormat );
numberFormat = { alignment: "RIGHT", dataformat: "##,####0.00" };
numberStyle = spreadsheet.createCellStyle( wb, numberFormat );
integerFormat = { alignment: "RIGHT", dataformat: "##,####0" };
integerStyle = spreadsheet.createCellStyle( wb, integerFormat );
dateFormat = { alignment: "RIGHT", dataformat: "m/d/yyyy" };
dateStyle = spreadsheet.createCellStyle( wb, dateFormat );
// Create Header Rows
currentRowNum = 0;
row = sheet.createRow(currentRowNum++);
col = 0;
for( label in qData.columnArray() ) {
	cell = row.createCell(col++); cell.setCellStyle(headerStyle); cell.setCellValue("Column #ucase(label)#");
}
secElapsed = (getTickCount()-tick)/1000;
qTimes.addRow({ desc = 'Headers', secElapsed = secElapsed, secCummulative = secElapsed });
// Loop over query data
prevTick = getTickCount();
cfloop( query = qData ) {
	if (qData.currentRow % 1000 == 0 ) {
		secElapsed = (getTickCount()-prevTick)/1000;
		secCummulative = (getTickCount()-tick)/1000;
		qTimes.addRow({ desc = 'Row #qData.currentRow#', secElapsed   = secElapsed, secCummulative = secCummulative});
		prevTick = getTickCount();
	}
	col = 0;
	row = sheet.createRow(currentRowNum++);
	cell = row.createCell(col++); cell.setCellValue(qData['col_1']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_2']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_3']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_4']); cell.setCellStyle(dateStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_5']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_6']); cell.setCellStyle(numberStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_7']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_8']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_9']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_10']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_11']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_12']); cell.setCellStyle(dateStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_13']); cell.setCellStyle(integerStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_14']); cell.setCellStyle(numberStyle);
	cell = row.createCell(col++); cell.setCellValue(qData['col_15']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_16']);
	cell = row.createCell(col++); cell.setCellValue(qData['col_17']);
}
// Output to file
spreadsheet.write( wb,"C:\temp\report.xlsx", true );
result = { totalMs: ( GetTickCount() - tick ), averageMsPer1000rows: ArrayAvg( qTimes.columnData( "secElapsed" ) ).Round( 3 ) };
// Display times
dump( result );
dump( var="#qTimes#", output="browser", abort="true" );
1 Like

Thanks for this! I do like this approach since it abstracts most of it to the lib. Some of our reports can generate a ton of rows, so performance is critical. Other than the size of the dataset, we’re not doing anything overly complicated.

Very much appreciated!

1 Like