Memory issue with 5.3.6.61

Zackster · January 6, 2021, 5:01pm

did you note how many items (pageContexts) that contained?

the problem about the application.log not being written was subsequently fixed, i’m guessing there are some useful clues about this problem which we can’t see due to it

Julian_Halliwell · January 6, 2021, 5:05pm

Where would I see that number?

Zackster · January 6, 2021, 5:08pm

it’s in one of the tree views in MAT, can’t remember it off the top of my head

bdw429s · January 6, 2021, 7:16pm

I’ve taken at look at the heap dump and I see a lot of stuff in here, but I’m not sure which parts are giving you memory issues. Here’s the top items in the dominator report (largest objects)

The Lucee CFML factory has a concurrentlinkedequeue named pcs that appears to be full of page context objects. It’s difficult to see how many items are in the queue since it’s a linked list, but most of these page context objects have large local scopes that contain large arrays of registration.cfc instances. Some of these CFC instances are 1.5 megs apiece. This cache of page contexts is 2.2 Gigs in size. I’m unclear whether they are all being used at the time the heap dump was taken or if Lucee is just caching them from previous requests and failed to clear them out to release their scopes of data.

There is a thread named pool-470-thread-1 that is retaining 648 Megs of heap that’s part of a concurrent FutureTask. I can’t tell what started the thread or what it’s doing, but it has a lot of thread local data including a page context that’s reported as 642 megs and includes such things as a closure scope containing a key called processLens

There’s a pretty big struct in the server scope that’s about 150 megs, another server scope around 62 megs, and a third that’s about 50 megs. A total of around 267 megs of retained objects in the server scope.

I would focus in on those page context objects. Based on what @Zackster has said, I think there’s a chance Lucee is recycling them between threads, but failing to clear out their scopes. If you call a lot of closures, each of those calls create an arguments and local scope that live in the pc and I think those can really build up over a request. I’m actually not 100% certain of a closure’s scopes are released as soon as the closure is finished executing or when the pc is released.

Julian_Halliwell · January 7, 2021, 2:36pm

Thanks for your help, Brad.

We can discount the server scope struct as we use that for general caching so it’s expected to persist in the heap.

processLens is a closure within the task that’s triggering the issue. There are three levels of closure.

An array of objects calls the outer closure in parallel: objects.Each( outerFunction, parallel )
outerFunction() in turn iterates over the sub closure processLens()
processLens() in turn iterates over another innerFunction() closure (twice).

So there’s a lot of nested iteration going on, and if PageContext objects are being created but not cleared/emptied then that sounds to me like a plausible explanation for the memory growth.

Thanks to you and Zac I feel I have enough info to log a ticket which links to the ArrayEach()/PageContext changes which may be the cause.

https://luceeserver.atlassian.net/browse/LDEV-3210

Julian_Halliwell · January 11, 2021, 11:26am

Micha has issued a speedy patch which I’m delighted to report seems to have fixed the issue:

Not only that, but with the patch applied on both our servers, overall heap usage seems to have gone down compared to the previous 5.3.4.x version we were using.

Special thanks to @Zackster and @bdw429s for their help getting this resolved.

bdw429s · January 11, 2021, 11:29pm

Excellent news @Julian_Halliwell, I can’t wait to test it. There is actually a related pull request for ColdBox where I had been discussing the memory overhead of calling closures a lot of times in Lucee. I had a suspicion the argument and local scopes in all of those closures (called thousands of times per page in some cases) were responsible for the heap usage, which could be noticeable under significant load. We removed some struct.filter( ()=>{} ) code from the framework and replaced it with a basic loop instead. I’m curious now if it performs any differently with this change.

cc/ @Dominic_Watson

Zackster · January 12, 2021, 12:37am

there’s a hardwired cache pool of 50 argument and local scopes, I’m kinda curious if some of these hardwired limits hinder performance under load

bdw429s · January 12, 2021, 1:09am

a hardwired cache pool of 50 argument and local scopes

Is this pool per page context/thread or for the entire server? I assume it’s allowed to grow as necessary by demand? I’m curious if it shrinks back down to 50 when no requests are running.

Zackster · January 12, 2021, 2:46am

a good question, (pretty sure) it’s per PageContext, so the question is rather than under load, but how deeply nested are the call chains under common frameworks / apps.

there doesn’t seem to be any shrinking logic