Lucee hanging on getDatasourceConnection

I’m seeing something which I think it similar to the issue described in DatasourceConnectionPool Exception on a site we recently put in (Lucee 5.2.3.35, but also observed in 5.2.1.9). OS is RHEL 7.3, and I’m using an Oracle 12 db.

The Lucee instance seems to get into a situation where we see many threads in a BLOCKED state for either getDatasourceConnection or, or releaseDatasouceConnection (examples below - I’ve had to snip these to fit within the character limit of Discourse):

- waiting on <0x18747c11> (a lucee.runtime.db.DCStack held by thread 3052740, ajp-nio-9409-exec-795)
lucee.runtime.db.DatasourceManagerImpl.releaseConnection(DatasourceManagerImpl.java:165)
lucee.runtime.tag.Query.executeDatasoure(Query.java:916)
lucee.runtime.tag.Query.doEndTag(Query.java:664)
org.harness.model.punterscornergateway_cfc$cf.udfCall2(/au/org/harness/model/PuntersCornerGateway.cfc:345)
org.harness.model.punterscornergateway_cfc$cf.udfCall(/au/org/harness/model/PuntersCornerGateway.cfc)
lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:108)
lucee.runtime.type.UDFImpl._call(UDFImpl.java:367)
lucee.runtime.type.UDFImpl.call(UDFImpl.java:227)

(complete paste at https://pastebin.com/0uUeYkDz)

There’s a bunch like that above which all saying they’re waiting on ajp-nio-9409-exec-795 but when I look at that thread it’s in the “RUNNABLE” state:

"ajp-nio-9409-exec-795" Id=3052740 RUNNABLE  

State: RUNNABLE 
Deamon: true 
Blocked: 329/00:02:14.023 
Waiting: 63/00:18:12.976 
CPU: 00:00:07.405 
User: 00:00:05.080 
Memory: (883,989,736 B) 
 
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:170)
java.net.SocketInputStream.read(SocketInputStream.java:141)
oracle.net.ns.Packet.receive(Packet.java:311)
oracle.net.ns.DataPacket.receive(DataPacket.java:105)
oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
oracle.jdbc.driver.T4C8TTIpro.receive(T4C8TTIpro.java:127)
oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1481)
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:496)
oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
oracle.jdbc.driver.OracleDriver.$fr$connect(OracleDriver.java:566)
oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java)
lucee.runtime.db.DataSourceSupport._getConnection(DataSourceSupport.java:105)
lucee.runtime.db.DataSourceSupport.getConnection(DataSourceSupport.java:88)
lucee.runtime.db.DatasourceConnectionPool.loadDatasourceConnection(DatasourceConnectionPool.java:124)
lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:99)
 - locked <0x18747c11> (a lucee.runtime.db.DCStack)
lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:76)
lucee.runtime.tag.Query.executeDatasoure(Query.java:896)
lucee.runtime.tag.Query.doEndTag(Query.java:664)
admin.model.newsgateway_cfc$cf.udfCall2(/natwebroot/admin/model/NewsGateway.cfc:665)
admin.model.newsgateway_cfc$cf.udfCall(/natwebroot/admin/model/NewsGateway.cfc)
lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:108)
lucee.runtime.type.UDFImpl._call(UDFImpl.java:367)
lucee.runtime.type.UDFImpl.callWithNamedValues(UDFImpl.java:213)
lucee.runtime.ComponentImpl._call(ComponentImpl.java:698)
lucee.runtime.ComponentImpl._call(ComponentImpl.java:580)
lucee.runtime.ComponentImpl.callWithNamedValues(ComponentImpl.java:1931)
lucee.runtime.tag.Invoke.doComponent(Invoke.java:221)
lucee.runtime.tag.Invoke.doEndTag(Invoke.java:194)
national.includes.display_objects.custom.media_room270.news_article_cfm225$cf.call(/muraWRM/national/includes/display_objects/custom/media-room/news-article.cfm:18)
lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:939)
lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:833)
lucee.runtime.PageContextImpl.doInclude(PageContextImpl.java:817)
content.contentrenderer_cfc$cf.udfCallb(/mura/content/contentRenderer.cfc:2463)
content.contentrenderer_cfc$cf.udfCall(/mura/content/contentRenderer.cfc)
lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:108)
lucee.runtime.type.UDFImpl._call(UDFImpl.java:367)
lucee.runtime.type.UDFImpl.callWithNamedValues(UDFImpl.java:213)
lucee.runtime.type.scope.UndefinedImpl.callWithNamedValues(UndefinedImpl.java:781)
lucee.runtime.util.VariableUtilImpl.callFunctionWithNamedValues(VariableUtilImpl.java:835)
lucee.runtime.PageContextImpl.getFunctionWithNamedValues(PageContextImpl.java:1719)
cfobject_cfc$cf.udfCall3(/mura/cfobject.cfc:228)
cfobject_cfc$cf.udfCall(/mura/cfobject.cfc)

(complete paste at https://pastebin.com/MxpEBexa)

My DBA and networking guys tell me everything they’re responsible is working normally and other ACF and Lucee instances on the box seem to continue to function normally as well, which makes me think the network connection to the database is ok, and the DB is definitely up.

Any suggestions on how I can debug this further? Can anyone see anything from the above that sheds any light on what’s going on? I’m fairly new to looking at thread dumps so I don’t know what I’m looking at really.

Thanks,
Andrew.