JDBC connection timeout causing server hang on all subsequent requests (stuck on at lucee.runtime.db.DataSourceSupport)

Environment: Lucee 5.2.1.9 running on Windows Server 2012 using the jtds JDBC driver to connect to SQL Server 2016.

We use Veeam to backup our servers running on VMWare nightly. The backup process can sometime trigger what is called a “disk stun” for fraction of a second while it consolidate the live VM disks. Essentially, the VM is paused for a fraction of a second and resumes immediately.

Every now and then this is enough to trigger a timeout when connecting to our database and the following exception is raised:

Network error IOException: Connection timed out:connect

SQLState string 08S01

lucee.runtime.exp.DatabaseException: Network error IOException: Connection timed out: connect at net.sourceforge.jtds.jdbc.JtdsConnection.<init>(JtdsConnection.java:436) at **net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:184) at lucee.runtime.db.DataSourceSupport._getConnection(DataSourceSupport.java:105) at lucee.runtime.db.DataSourceSupport.getConnection(DataSourceSupport.java:88) at lucee.runtime.db.DatasourceConnectionPool.loadDatasourceConnection(DatasourceConnectionPool.java:124)** at lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:99) at lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:73) at 
<snip>

To me that looks like a regular network timeout, however once that happen all the other requests that have a database connection hang until we restart the service.

catalina.[date].log gets filled of stacktrace all pointing to lucee.runtime.db.DatasourceConnectionPool.java

Example from catalina:

“Thread-104035”
lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:77)
lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:73)
lucee.runtime.tag.Query.executeDatasoure(Query.java:847)

We get as many of those as there are requests using the database until we restart.

Looking at DatasourceConnectionPool.java I can see that this method is synchronized which I think might be part of the issue?

The first timeout exception occurs in the same synchronized function at row 88 when trying to get a connection.

// get an existing connection
while(!stack.isEmpty()) {
  DatasourceConnection dc=(DatasourceConnection) stack.get(); //this line throws a SQLException.
  if(dc!=null){
    rtn=dc;
    break;
  }
}

With the whole thing running in a synchronized function is is perhaps possible that the following calls get stuck waiting for the thread that received an exception to finish?

From my understanding of Java’s synchronized function it should exit if the code ends abruptly (i.e. exception is thrown). The jtds does throw a SQLException but I am not sure if I got that right or if perhaps the Lucee call would need to handle the exception gracefully?

2 Likes