Since migrating from ACF to Lucee, we’ve been seeing Lucee occasionally hang up and stop processing requests. When this happens, FusionReactor becomes basically unusable, so it’s hard to gather exactly what is happening.
What I do can gather from the FusionReactor profile is that requests are hanging in areas like:
Waiting on lock <0x799570f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
and
java.lang.ClassLoader.loadClass(ClassLoader.java:569)
- waiting on <0x330e0f60> (a lucee.commons.io.res.util.ResourceClassLoader held by thread 596892, http-nio-8080-exec-26)
java.lang.ClassLoader.loadClass(ClassLoader.java:522)
lucee.commons.lang.ClassUtil$ClassLoaderBasedClassLoading.loadClass(ClassUtil.java:896)
lucee.commons.lang.ClassUtil.__loadClass(ClassUtil.java:371)
lucee.commons.lang.ClassUtil._loadClass(ClassUtil.java:355)
lucee.commons.lang.ClassUtil.loadClass(ClassUtil.java:247)
lucee.commons.lang.ClassUtil.loadClass(ClassUtil.java:236)
lucee.runtime.reflection.Reflector.isInstaneOf(Reflector.java:131)
lucee.runtime.functions.decision.IsInstanceOf.call(IsInstanceOf.java:50)
It’s looking like there’s some single threaded locks which Lucee is waiting to free up that are hung up.
I searched through the Jira list and searched the forums, but didn’t see anything that matched what I’m seeing.
What should I do to try and troubleshoot the issue further? What might be the cause?
OS: Linux (3.10.0-1160.71.1.el7.x86_64) 64bit
Java Version: 11.0.16 (Red Hat, Inc.) 64bit
Tomcat Version: Apache Tomcat/9.0.63
Lucee Version: Lucee 5.3.9.141
Thanks!
-Dan
1 Like
Try the latest snapshots!
Possible deadlock with PhysicalClassLoader [LDEV-4169] - Lucee
2 Likes
So this happened again. This time I was able to get some stacktraces from running requests. Most of them looked like this:
"http-nio-8080-exec-96" Id=546383 WAITING on java.util.concurrent.locks.ReentrantLock$FairSync@27171041
java.lang.Thread.State: WAITING
at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$FairSync@27171041
at java.base@11.0.16/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
at java.base@11.0.16/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:5522)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2110)
at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:984)
at lucee.runtime.osgi.OSGiUtil.extractAndLoadBundles(OSGiUtil.java:1979)
at lucee.runtime.PageContextImpl.getResourceClassLoader(PageContextImpl.java:3502)
at lucee.runtime.PageContextImpl.getClassLoader(PageContextImpl.java:3488)
at lucee.runtime.functions.other.JavaProxy.loadClassByPath(JavaProxy.java:126)
at lucee.runtime.functions.other.JavaProxy.loadClass(JavaProxy.java:65)
at lucee.runtime.functions.other.JavaProxy.call(JavaProxy.java:60)
at lucee.runtime.functions.other.CreateObject.doJava(CreateObject.java:140)
at lucee.runtime.functions.other.CreateObject.call(CreateObject.java:62)
at lucee.runtime.functions.other.CreateObject.call(CreateObject.java:49)
at util.datetimeutil_cfc$cf.initComponent(/basecom/util/DateTimeUtil.cfc:118)
Line 118 in DateTimeUtil.cfc
is only doing:
javaTimeZone = createObject("java", "java.util.TimeZone");
Now that is outside of any method/function and is just in the body of the CFC. I’m going to move it into a constructor method to see if that helps, but not sure why that would be causing issues. What’s odd the DateTimeUtil.cfc
is a singleton and should be kept in the FW/1 bean cache. It gets loaded when the application spins up, so it should never have to re-initialize. So I’m not sure why multiple threads might be trying to run that branch of logic.
Anyone have any ideas on how I can troubleshoot this issue further?
In looking through the catalina.out
log file, right before things go haywire, I see a:
java.lang.ThreadDeath
at java.base/java.lang.Thread.stop(Thread.java:937)
at lucee.commons.io.SystemUtil._stop(SystemUtil.java:1342)
at lucee.commons.io.SystemUtil.stop(SystemUtil.java:1333)
at lucee.commons.io.SystemUtil.stop(SystemUtil.java:1318)
at lucee.runtime.engine.Controler.run(Controler.java:167)
I believe this is coming from a cfthread
which didn’t have a setting requestTimeout="XXXX"
, so was timing out (this thread needs a long timeout because it’s running a process that could take many minutes to complete). Inside that thread is also a lock
statement.
Is it possible Lucee is killing the thread inside the lock and that’s causing issues inside the JVM?
This particular lock is a named lock and is pretty unique. It definitely should not be causing waits on threads that I’m seeing java.lang.Thread.State: WAITING
states on, but if the ThreadDeath
is causing issues with the locking mechanism in general, that could be the issue.
Is this possible?
@cfmitrah,
I appreciate the feedback. I’ve got a fix going out tomorrow that I believe will resolve the java.lang.ThreadDeath
issues, so hopefully I can at least see if it was the ThreadDeath
issues causing the problem.
1 Like
@dswitzer thanks for your input on this issue.
Can you please create a ticket for this in Jira? https://luceeserver.atlassian.net/
If I can find a way to recreate the problem, I surely will!
1 Like
Since we’re still seeing issues with Lucee’s class loader blocking threads, I’ve filed a Jira issue for this problem:
https://luceeserver.atlassian.net/browse/LDEV-4905
1 Like