Datasource Connections not closing

MShayOffice · May 14, 2024, 8:12pm

I work on a site with a few hundred daily users, and a month or two ago, we ported it over to a newly set up AWS EC2 instance, Windows Server 2022 Datacenter + IIS + Tomcat.

Every few days since then, we’ve been getting a wave of timeouts from the database, and had to go in and manually restart the Tomcat service to fix it.

From what we can tell, our MSSQL datasource connections just seem to arbitrarily stop closing. Then they’re maxing out or slowing down, causing the connections to time out, resulting in a waves of error emails.
Originally the datasource had the default connection limit of 100, so I bumped that up to Max, though the timeouts kept coming. I looked at the Datasources page during one of these timeout events, and it was up around 200 Open Connections and steadily increasing.

I eventually realized that the default MSSQL driver said jre8 and the site is running on Java 11. I updated to a newer jre11 driver within Lucee, and it looked like it fixed it for a few days, but then things started slowing down and we started getting timeouts again.

It wasn’t a huge swarm of errors like before, but I saw that 2000+ connections were now open on the Datasources tab on our site’s Lucee Web dashboard. Every time I opened the page, the number would increment, indicating that we weren’t necessarily under some sort of attack, but that it just stopped closing connections at some point and had been building them up.

We experimented a little bit when deploying to this new server, opting for the Ortus ORM Extension and for Lucee 6, instead of Hibernate on Lucee 5 like we were using originally. I’ve been doing small incremental changes to try and pinpoint the cause each time this connection problem starts back up, and I’m going to try switching ORM plugins and downgrading Lucee after it starts failing a few more times.

We set up Fusion-Reactor to help troubleshoot, and it helped me narrow things down a little bit, but it’s still unclear to me what’s causing it or what exactly is going on. Before fixing the SQL driver, I think JDBC was failing completely and not restarting/reconnecting, but I can’t figure out a good way to see more information on these stale connections, the status of the connector or drivers, anything that might help narrow things down.

OS: Windows Server 2022 Datacenter
Java Version: 11.0.21 (Eclipse Adoptium) 64bit
Tomcat Version: 9.0
Lucee Version: 6.0.1.83

Microsoft SQL Server (Vendor Microsoft) Version: 12.4.2.jre11 (Server)
Ortus ORM Extension: 6.5.2 (Web)

MShayOffice · June 3, 2024, 10:23pm

We tried switching back from the Ortus ORM to Hibernate. The errors persisted, but presented themselves a bit differently, lagging out the site over time instead of spamming our error logs with timeouts.

We have some large tables with some slow queries, and when a timeout occurred from those, I think it would abandon all open connections and open new ones, lagging down the system and causing more frequent timeouts, abandoning more connections, leading to a runaway error.

Eventually, I did some extra research and assessed our codebase and it seems that we weren’t actually using the ORM at all, it was an implementation a developer started but didn’t get very far. After phasing it out, we haven’t had this strange timeout issue anymore.

I don’t think this thread needs support anymore, but I just wanted to follow up with our experience and solution in case it helps out anyone in the future.