Apparently, in Lucee the entityLoad call gets Hibernate to keep running its classloader at runtime

BK_BK · November 18, 2022, 5:05pm

Apparently, in Lucee the entityLoad call gets Hibernate to keep running its classloader at runtime. The result is that one request would hold a handle on the Hibernate session, blocking access to any other request threads.

Not only that, the running (blocking) thread takes too long. Sometimes up to minutes. Evidently because it gets Hibernate to load a class. The blocked requests (invariable entityLoad calls themselves) eventually time out.

The higher the load, the more likely this issue will be triggered. Which is why I have not yet been able to repro the issue. As I have not been able to simulate the traffic pattern that causes it.

Nevertheless, I have discussed the issue earlier in the post " Are Lucee’s custom class loaders single-threaded or multithreaded?"
The stacktrace typically ends in:

…
[entityLoad call from custom, user-defined CFC]
lucee.runtime.functions.orm.EntityLoad.call(EntityLoad.java:70)
org.lucee.extension.orm.hibernate.HibernateORMSession.loadAsArray(HibernateORMSession.java:676)
org.lucee.extension.orm.hibernate.HibernateORMSession.load(HibernateORMSession.java:851)
org.hibernate.internal.CriteriaImpl.list(CriteriaImpl.java:370)
org.hibernate.internal.SessionImpl.list(SessionImpl.java:1851)
org.hibernate.metamodel.internal.MetamodelImpl.getImplementors(MetamodelImpl.java:661)
org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.classForName(ClassLoaderServiceImpl.java:130)
java.lang.Class.forName(Unknown Source)
java.lang.Class.forName0(Native Method)

The question is, why is it that whenever Lucee runs entityLoad it gets Hibernate to run the classloader?

One consequence is that we, on occasion, have to restart Lucee instances. That is the order of the day now for our application. It is becoming a deal breaker for the development team, making Lucee adoption difficult.

Please kindly offer any suggestions you may have.

I have attached a typical thread dump and error report.
2 examples of blocked calls:

/* /nl/dvnt/stdmtr/service/UserData.cfc:949 */
var aResult=entityLoad("UserResults", {userId: nUserId, exerciseUUID: sExerciseUuid, exerciseId: nExerciseId});

/* /nl/dvnt/stdmtr/service/UserData.cfc:992 */
var aResult=entityLoad("UserAnswers", {userId: nUserId, exerciseUUID: sExerciseUuid, exerciseId: nExerciseId}, "sortOrder asc");

Lucee 5.3.9.163-SNAPSHOT on CommandBox
Hibernate 5.4.29.15-BETA
Windows Server 2016

Thread Dump - sm1-2 10.05 17-11-2022.txt (735.3 KB)
timeout-error-report.txt (11.7 KB)

michaelborn_me · November 18, 2022, 7:20pm

I am neither a Java guru nor a classloading expert, but a few things seem apparent here:

Hibernate uses a class loader to load classes. This is obvious and expected behavior. You’ve asked Hibernate to load the UserResults entity, which is a Java class under the hood. There is no bug or error here.
The locking or timeout issues may have nothing to do with Hibernate’s behavior, and everything to do with your own code.

My suggestions:

Take a close look at every line of code in those entities. Remove all isInstanceOf() and createObject( "java", ...) uses, as I’ve seen these to be performance bottlenecks.
Take a close look at any code affecting the ORM session before the entityLoad() calls. Look for ormFlush(), transaction ends, ormClearSession(), anything that touches the ORM session. Just a hunch as I’ve fought with a lot of Hibernate session goofups in Lucee.
Try dropping Lucee down to 5.3.9.160 or lower, just in case you’re running into a regression from LDEV-4169.
Remember that code hitting a lock is not the issue… it’s the code creating the lock you need to identify and debug. Why is some code creating a lock, and not closing that lock? Look for any lock{} usages, file IO, DB calls, transactions, basically anything that could create a lock.

As gently as I can say this… I think you’re trying to treat the symptom (i.e. entityLoad() takes a long time / times out) instead of the cause. Look for the cause, and the symptoms will sort themselves out.

BK_BK · November 21, 2022, 2:36pm

Hi @michaelborn_me ,
Thank you for sharing your insights on this.

Let me begin with your closing remark. I can assure you that my search is NOT for a treatment of the symptoms. In fact, I am surprised that you have that impression.

My search is for the cause. If you reread my posts on the subject, you will see that one question recurs more frequently than any other. Namely, why.

I ask because, firstly, our application makes abundant use of Hibernate. Solving the problem for just 3 entities is analogous to using your fingers to plug holes in the dyke, as the Dutch would say.

Secondly, we see classloader-related issues not only with Hibernate. But also with isInstanceOf(), for example, which you in fact mention as a performance bottleneck. In any case, I have verified that neither isInstanceOf() nor createObject( "java", ...) occurs in the entity CFCs involved in this problem.

The present question is not the only one I have asked about Lucee’s Hibernate implementation. Elsewhere in this forum I asked why you get "lucee.runtime.exp.CasterException: null can not be casted to a Struct" when you use named arguments in entityLoad(). I included a way to reproduce the exception. See Birds example there.

Might the exception be related to the current issue? After all, arguments are related to constructors.

That said, I am heeding your advice and taking another look at:

any code affecting the ORM session before the entityLoad() calls;
ormFlush(), transaction ends, ormClearSession(), anything that touches the ORM session;
lock usages.

Plenty to go on.
Thanks for that.

michaelborn_me · November 21, 2022, 7:53pm

Excellent! “Why” is the question to ask here!

Be careful not to confuse “usage” with “issues”, because, again, classloader usage is not in itself an issue. If the timeouts consistently happen at the time of classloading, I totally agree you should be suspicious. But again, this appears to be the effect of locking, and not the cause of the timeouts you are seeing.

Also, I’ve read that CFML timeouts can be deceiving. I’m hoping @Brad_Wood can chime in on that one. i.e. Is it still true that CF engines aren’t great at catching a long-running thread immediately?

No, not in any way. Do you normally consider argument casting to be related to thread timeouts?

Brad mentioned on your other post that you should write this up as a Lucee bug ticket. I would second that.

Don’t forget rolling back your Lucee version to see what, if any, different results you may obtain.