Lucee 5.3.5.92 Java Non-Heap Memory Steadily Increases

instantestore · April 29, 2020, 2:53am

My installation of Lucee seems to have a steadily increasing Java non-heap. Although that’s expected, it increases all the way up to 100% over a span of 12-18 hours. The Java heap space is fine, normally hovering anywhere from 40-60%.

My settings to Java are as below.

-server -Xms6144m -Xmx6144m -Xss2048k -Djava.awt.headless=true -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Duser.timezone=America/New_York -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dfile.encoding=UTF-8

The machine has 32GB of RAM. “top” shows the “java” process is using around 7GB of RAM.

Right now, I’m restarting Lucee every 24 hours or so to clear the non-heap space but I don’t think that’s something I should do, right? How would I troubleshoot what’s taking up all the non-heap space. Also, how can I find out the actual size of the non-heap space. I’m now getting the % from Lucee Admin Overview.

Charles Tang

Don’t forget to tell us about your stack!

OS: CentOS 7.7.1908
Java Version: 1.8.0_161
Tomcat Version: Apache Tomcat/9.0.31
Lucee Version: 5.3.5.92

carehart · April 29, 2020, 5:24pm

So when you refer to “non-heap”, do you mean you see that reported somewhere? Or do you just mean you see the java process get much higher than that 7g over time, and you know the max heap is 6g, so you are just referring to the differene as “non-heap”? Just curious.

As for what it could be, well, I note you don’t have a maxmetaspacesize jvm arg. It COULD be that which is rising, and while some tools could tell you WHAT its size is, you could also start by just adding a value like 1g (or 1000m, it doesn’t matter), -XX:MaxMetaspaceSize=1000m. At least then, if you found that the Java/lucee process no longer “kept creeping way beyond” several gb, you could infer that’s the “problem”. You will of course also start to see outofmemoryerror: metaspace errors.

The next question would then be WHY are you blowing out the metaspace. But let’s take things one step at a time.

And I’m sure others will chime in with other suggestions for you to consider. But if you could answer those first questions above, it may help them in their suggestions also.

bdw429s · April 29, 2020, 7:18pm

Everything Charlie said. Plus, add FusionReactor to the box and look at all the memory spaces it repots on in addition to the number of loaded class files. Some recent Lucee versions had issues never letting go of class files. I thought that was addressed in 5.3.5 but to be honest all the versions start to run together in my mind. I would also test on 5.3.6-RC for good measure. It has a bunch of class loading fixes in it that existed in 5.3.5.

carehart · April 29, 2020, 8:20pm

Yep, excessive class-loading will put pressure on the metaspace (as some will know it did the permsize in Java 7 and earlier). And like you, I couldn’t remember which Lucee version it was where that was more an issue and/or resolved, but I remember seeing the discussion pass by here in the past.

And yep on FR. Folks interested can see the Resources>Memory Spaces page, and in the top right corner you can choose the various java memory spaces (heap and non-heap, including the metaspace), to see a graphical display of the memory used, allocated, and max in each space (over the past minute, hour, day, or week–up to the last restart of Lucee/CF/java/whatever). Do note that if you have no maxmetaspace, there will of course be no “max” value on that graph.

Micha11 · April 30, 2020, 7:38am

There was a similar discussion a few days ago: https://lucee.daemonite.io/t/memory-usage-jump-5-3-5-78-to-92/6813
I seesome kind of issue in use of Java-MetaSpace in my setup as well.

I posted some lines of CF code there, where you can see memory usage without FR (although it is a useful tool in general)

instantestore · April 30, 2020, 8:23am

Thanks for the information, Charlie, Micha and Brad. The “non-heap space” I’m seeing is on the Lucee Admin Overview page. Whenever I restart, it starts off very low (~10%). This steadily increases over a few hours until it hits close to 100%. When it does that, then Lucee starts to act erratically. Some calls would be super fast while others take 5-10 seconds to run. These are calls that would normally take <100ms.

Restarting Lucee would then make it work again. So, I’ve been restarting Lucee every 8 hours or so. Not exactly a tenable position since we’re running an ecommerce store on Lucee . Restarting means people who are in the midst of shopping may see an error page during the restart.

I’ve now upgraded to 5.3.6.53-RC and am monitoring it to see whether it fixes it. I’m also going to install FR to give more insight into what’s eating up the memory. Considering this server has 32GB of RAM, it should definitely be plenty for our usage.

I’ve also added a -XX:MaxMetaspaceSize=3g for now and the meta space used is being logged in catalina.out. So, that does give more insight as well.

Closely monitoring this. Will report back after a few hours.

Thanks a lot, guys!

Charles Tang

rd444 · May 3, 2020, 1:06pm

@instantestore - did upgrading to 5.3.6.53-RC solve this for you?

instantestore · May 3, 2020, 1:54pm

Looks like it does help somewhat in that the non-heap doesn’t go to 100% now. It steadily builds up to around 50%. Since I allocated 3GB of MaxMetaSpace, it must be using around 1.5GB of meta space.

I’m not sure whether it should be that high or what’s in it, I’ll continue to monitor to see whether it’ll continue to go up or it stays there.

Let you guys know in a few days.

Charles

bdw429s · May 3, 2020, 9:20pm

the non-heap doesn’t go to 100% now. It steadily builds up to around 50%. Since I allocated 3GB of MaxMetaSpace, it must be using around 1.5GB of meta space.

Unless I’m mistaken, metaspace is part of the heap now, so it sounds like you may be making some assumptions about what is where. My suggestion to you to use FusionReactor to get much much more specific details on exactly what memory spaces are filling up still stands.

I’ll also add that the Java garbage collector is lazy and doesn’t collect things unless it needs to. When you have a very large heap assigned to the JVM, it can be normal and even expected that the JVM will make good use of it and will leave things uncollected until it either needs that space back, or you force a GC (another feature FR can help you do). Unless you are getting an OOM (Out Of Memory) error, it doesn’t necessarily mean there is a problem just because there is stuff in memory that isn’t going away.

carehart · May 4, 2020, 12:33am

I can offer some (documented) clarifications here.

The metaspace is NOT in the heap. What you may be recalling, Brad, was that there was some question in Java 7 and earlier about whether the PERM space was in the heap or not. But the metaspace (new in Java 8 and above) is definitely NOT. Instead, it’s taken from available memory on the box (native or OS memory).

For any who may prefer to hear that from Oracle rather than me :), it’s documented in various places, such as here:

“Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata.”

But indeed as you say, GCs can help impact the metaspace used, at least if objects are GCed (from the heap) whose classes are then unloaded, as the document says later:

“Class metadata is deallocated when the corresponding Java class is unloaded. Java classes are unloaded as a result of garbage collection, and garbage collections may be induced to unload classes and deallocate class metadata.”

Finally yes, as you say (and I did earlier), it is possible to observe the metaspace as well as other Java memory spaces (heap and non-heap) in FR, as well as in other tools.

bdw429s · May 4, 2020, 7:16pm

Thanks for the clarification Charlie.

bdw429s · May 19, 2020, 10:52pm

@instantestore What became of this issue?

rd444 · May 27, 2020, 10:36am

Unfortunately I can report from here that it’s still an issue in 5.3.6.61, where it took more than 6 times the allocated memory and crashed our server last night.

We had to revert to 5.35.5.78 again.

bdw429s · May 27, 2020, 4:56pm

@rd444 Can you open a ticket so the Lucee devs can help figure out what is filling your memory? I’m quite used to analyzing heap dumps to see what’s filling up heap, but I’m not sure how to troubleshoot the non-heap memory outside of using FusionReactor to hone in on exactly what non-heap memory space is filling up. If you don’t want to pay the $$ for FR, you should at least find another tool that will measure and report on each non-heap space. Otherwise, this will just be a black box forever for you.

Zackster · December 18, 2020, 11:30am

bug was filed

https://luceeserver.atlassian.net/browse/LDEV-2904

if it started with 5.3.5.78, that change was ram drive related

bdw429s · December 18, 2020, 5:13pm

@Zackster Ram drive is stored in heap, not in non-heap memory. In fact, every single thing being discussed in that tickets’ comments is related to heap memory, not non-heap. My advice from months ago still stands-- the OP needs to identity the exact memory space that is filling up to find the cause. It seems this still has not happened.

Simon_Goldschmidt · February 10, 2021, 1:30am

We noticed similar behaviour, where memory usage by the Apache Commons Daemon Service Runner creeps up until memory sits in the high 90s and we start seeing the occasional request timeout, which indicates to us it’s time to restart Lucee. If left alone, the server will eventually crash.

We first addressed this by adding a pagepoolclear() command to a nightly script, which bought us some extra time but didn’t solve the issue.

Whilst doing a reboot this morning, I took a server out of the pool and ran Windows Updates and noticed a stack of memory freed up and memory used by the Apache service dropped over the course of several minutes to 25Mb.

Is there a simple way I can run a more aggressive garbage collection process, perhaps similar to may have happened while I was running Windows Updates? Am I contributing to this overload of memory by using a cfinclude tag to read a bunch of application settings on every page? If so, is there a better way to do this? And shouldn’t the pagepoolclear() function free up these resources anyway?

We’re using Windows Server 2019, Lucee 5.3.7.47, Java 11.0.7, Tomcat 9.0.35.

Thanks… Simon

Julian_Halliwell · February 10, 2021, 8:00am

@Simon_Goldschmidt take a look at this thread Your issue sounds similar and applying the latest 5.3.8.139-RC may solve it.

Simon_Goldschmidt · February 11, 2021, 12:44am

Thanks Julian… we have upgraded one server to Lucee 5.3.8.139RC and will run it for a week and report any differences.

Simon_Goldschmidt · February 18, 2021, 12:05am

No particular difference between Lucee 5.3.7 and 5.3.8 for us. Memory usage builds through the day and the nightly pagepoolclear() does free up resources. Looks like our servers are simply running out of memory when there is a spike in traffic. We’ll try reducing the Maximum Memory Pool allocated to Java and throw more memory at servers if this doesn’t sort it out.