Intermittent long delays between running cfquery when switching datasource mid-request

OS: Windows
Java Version: 11
Tomcat Version: 9
Lucee Version: 6.2.6.19

Dear fellow Devs,

We are experiencing regular (~twice a week) occasions when our Lucee servers will see a sudden rise in blocked threads and an associated degradation of performance. The blocked threads accumulate until usually a restart of Tomcat is required to bring it back to good performance.

This began after updating from Lucee 5 up to 6.2 a few months ago, and recent updates have possibly reduced the impact of these issues in that the servers seem better able to cope with the large number of blocked threads but it still impacts the application enough to require Tomcat restarts.

In Fusion Reactor we can see that the blocked threads are all stuck at a similar point - they are blocked between having finished one cfquery against a datasource, and then running another query against a different datasource. Once the second query is run then the request will go off and complete. This shows up like the below in Fusion Reactor profiling:

100.0% - 59.848s lucee.runtime.tag.Query._doEndTag(Query.java)
100.0% - 59.848s lucee.runtime.tag.Query.executeDatasoure(Query.java)
100.0% - 59.848s lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java)
100.0% - 59.848s lucee.runtime.config.ConfigWebImpl.getDatasourceConnectionPool(ConfigWebImpl.java)
100.0% - 59.848s lucee.runtime.config.ConfigImpl.getDatasourceConnectionPool(ConfigImpl.java)
100.0% - 50.444s Waiting on lock <0x599bf1cb> (a java.lang.String)

These are the typical settings on the datasources for connection limits:

Connection Limit: inf
Connection idle timeout (in minutes): 20
Connection timeout: inf
Max idle connections: default
Min idle connections: default

Any advice on troubleshooting or possible causes would be very gratefully received, thank you.

While someone else may readily recognize the issue and offer a solution, until then it may help to hear what kind of db (driver) you’re using. And are the two dsn’s using the same driver? Are they on the same server as each other? (That last question may be irrelevant, but worth asking.)

(And are you saying the request where it happens always hangs up or is it sporadic? If it does always happen, have you confirmed what happens if you ONLY have a query to the “second” dsn?)

Finally, while you’ve shared the fr profile, what would be more valuable is the stack trace, taken while the request is hung. It’s one of the buttons on the right of a running request in FR.

You may see it indicate that the current request is blocked by another. It will show the jvm thread id. You can then go back to the running requests list to find it: the thread ID is listed a couple columns left of the url for the request, and stack trace that.

It might help if you’d share the first few lines of those stack traces.