For any Developers out there migrating Adobe Coldfusion projects to Lucee that run into memory issues like I did, here are my 2 cents.
There isn’t much about this stuff online, so I figured either noone is having these issues and this post isn’t of use to anybody, or maybe there are a few developers with the same issues and would be glad to find something online about this topic.
Project information
Migrating a stable Coldfusion web application from a Coldfusion 10 / oracle environment to a Lucee 5.2.8.50 / Mysql 8.1 environment.
Blocking issues
The Lucee instance kept crashing after Out of memory errors after a couple of hours. Excessive tweaking of the JRE params and heap size had little to no effect. Our Old Gen memory space kept filling up. Manual full GC even filled it up faster. This issue had all the signs of a memory leak.
Solution
-
Using FusionReactor to monitor the memory and heap usage, taking heap snapshots and comparing them we found some third party java classes that were initialized in our code at runtime. These classes appeared to be left behind in memory. Explicitly setting the variables containing these instances to null increased our Old Gen lifetime from 3 to 6 hours. Still not enough for a stable production application, but it’s something at least.
-
The heap snapshots also showed a lot of
lucee.runtime.type.StructImpl
classes. We’ve learned about the lucee structnew type argument which allows structs to use weak references that can be garbage-collected. We altered the application codebase to convert most structs to structsNew(‘weak’). This didn’t change anything for the memory leak, but made the GC have a slight positive effect. Do not use this for structs in your application scope however, or GC will clear those out as well. -
Some part of the code was using a javaloader in runtime to load an additional jar for some specific image processing. Removing the javaloader all together loading this additional jar at startup as a lucee lib made the initial heap usage after startup lower and made our heap last longer.
-
The big bulls-eye was looking up all
java.util.Treemap
andjava.util.LinkedHashMap
instances that were sometimes used in code to make a struct-like object that preserved the order of the keys. It appears that these native objects can’t be garbage-collected. Refactoring our code to replace these objects with normal arrays and structs made our entire heap run smooth and stable. This had a huge impact!
Our application still can’t run indefinitely without eventually running out of memory, but at least our efforts increased the stability to a workable state. The top classes that occupy the heap now are lucee.commons.collection.concurrent.ConcurrentHashMapPro, but issues with this class are already picked up in the Lucee issue tracker.
I don’t know why these native java objects weren’t a problem in Adobe Coldfusion. It was also running on tomcat, but a slightly older version.
This has been a stressful witch-hunt, so i hope i can save someone else some trouble with sharing what helped in our situation.