Lucee 5.3.9.141 hanging requests

dswitzer · August 30, 2022, 12:58pm

Since migrating from ACF to Lucee, we’ve been seeing Lucee occasionally hang up and stop processing requests. When this happens, FusionReactor becomes basically unusable, so it’s hard to gather exactly what is happening.

What I do can gather from the FusionReactor profile is that requests are hanging in areas like:

Waiting on lock <0x799570f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

and

java.lang.ClassLoader.loadClass(ClassLoader.java:569)
- waiting on <0x330e0f60> (a lucee.commons.io.res.util.ResourceClassLoader held by thread 596892, http-nio-8080-exec-26)
java.lang.ClassLoader.loadClass(ClassLoader.java:522)
lucee.commons.lang.ClassUtil$ClassLoaderBasedClassLoading.loadClass(ClassUtil.java:896)
lucee.commons.lang.ClassUtil.__loadClass(ClassUtil.java:371)
lucee.commons.lang.ClassUtil._loadClass(ClassUtil.java:355)
lucee.commons.lang.ClassUtil.loadClass(ClassUtil.java:247)
lucee.commons.lang.ClassUtil.loadClass(ClassUtil.java:236)
lucee.runtime.reflection.Reflector.isInstaneOf(Reflector.java:131)
lucee.runtime.functions.decision.IsInstanceOf.call(IsInstanceOf.java:50)

It’s looking like there’s some single threaded locks which Lucee is waiting to free up that are hung up.

I searched through the Jira list and searched the forums, but didn’t see anything that matched what I’m seeing.

What should I do to try and troubleshoot the issue further? What might be the cause?

OS: Linux (3.10.0-1160.71.1.el7.x86_64) 64bit
Java Version: 11.0.16 (Red Hat, Inc.) 64bit
Tomcat Version: Apache Tomcat/9.0.63
Lucee Version: Lucee 5.3.9.141

Thanks!

-Dan

Zackster · August 30, 2022, 1:21pm

Try the latest snapshots!

Possible deadlock with PhysicalClassLoader [LDEV-4169] - Lucee

dswitzer · September 2, 2022, 7:18pm

So this happened again. This time I was able to get some stacktraces from running requests. Most of them looked like this:

"http-nio-8080-exec-96" Id=546383 WAITING on java.util.concurrent.locks.ReentrantLock$FairSync@27171041 
   java.lang.Thread.State: WAITING
        at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)
        - waiting on java.util.concurrent.locks.ReentrantLock$FairSync@27171041
        at java.base@11.0.16/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
        at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
        at java.base@11.0.16/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:5522)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:2110)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:984)
        at lucee.runtime.osgi.OSGiUtil.extractAndLoadBundles(OSGiUtil.java:1979)
        at lucee.runtime.PageContextImpl.getResourceClassLoader(PageContextImpl.java:3502)
        at lucee.runtime.PageContextImpl.getClassLoader(PageContextImpl.java:3488)
        at lucee.runtime.functions.other.JavaProxy.loadClassByPath(JavaProxy.java:126)
        at lucee.runtime.functions.other.JavaProxy.loadClass(JavaProxy.java:65)
        at lucee.runtime.functions.other.JavaProxy.call(JavaProxy.java:60)
        at lucee.runtime.functions.other.CreateObject.doJava(CreateObject.java:140)
        at lucee.runtime.functions.other.CreateObject.call(CreateObject.java:62)
        at lucee.runtime.functions.other.CreateObject.call(CreateObject.java:49)
        at util.datetimeutil_cfc$cf.initComponent(/basecom/util/DateTimeUtil.cfc:118)

Line 118 in DateTimeUtil.cfc is only doing:

javaTimeZone = createObject("java", "java.util.TimeZone");

Now that is outside of any method/function and is just in the body of the CFC. I’m going to move it into a constructor method to see if that helps, but not sure why that would be causing issues. What’s odd the DateTimeUtil.cfc is a singleton and should be kept in the FW/1 bean cache. It gets loaded when the application spins up, so it should never have to re-initialize. So I’m not sure why multiple threads might be trying to run that branch of logic.

Anyone have any ideas on how I can troubleshoot this issue further?

dswitzer · September 2, 2022, 8:25pm

In looking through the catalina.out log file, right before things go haywire, I see a:

java.lang.ThreadDeath
	at java.base/java.lang.Thread.stop(Thread.java:937)
	at lucee.commons.io.SystemUtil._stop(SystemUtil.java:1342)
	at lucee.commons.io.SystemUtil.stop(SystemUtil.java:1333)
	at lucee.commons.io.SystemUtil.stop(SystemUtil.java:1318)
	at lucee.runtime.engine.Controler.run(Controler.java:167)

I believe this is coming from a cfthread which didn’t have a setting requestTimeout="XXXX", so was timing out (this thread needs a long timeout because it’s running a process that could take many minutes to complete). Inside that thread is also a lock statement.

Is it possible Lucee is killing the thread inside the lock and that’s causing issues inside the JVM?

This particular lock is a named lock and is pretty unique. It definitely should not be causing waits on threads that I’m seeing java.lang.Thread.State: WAITING states on, but if the ThreadDeath is causing issues with the locking mechanism in general, that could be the issue.

Is this possible?

cfmitrah · September 12, 2022, 11:05am

Another user faced the same hanging issue with createObject

https://luceeserver.atlassian.net/browse/LDEV-4064?focusedCommentId=52162

dswitzer · September 12, 2022, 1:30pm

@cfmitrah,

I appreciate the feedback. I’ve got a fix going out tomorrow that I believe will resolve the java.lang.ThreadDeath issues, so hopefully I can at least see if it was the ThreadDeath issues causing the problem.

cfmitrah · September 12, 2022, 2:15pm

@dswitzer thanks for your input on this issue.
Can you please create a ticket for this in Jira? https://luceeserver.atlassian.net/

dswitzer · September 12, 2022, 2:37pm

If I can find a way to recreate the problem, I surely will!

dswitzer · June 18, 2024, 4:23pm

Since we’re still seeing issues with Lucee’s class loader blocking threads, I’ve filed a Jira issue for this problem:

https://luceeserver.atlassian.net/browse/LDEV-4905