Database connections becoming idle and maxing out regularly

rshake · February 14, 2019, 3:44pm

We are on Lucee 5.2.6.59 and using Postgres (9.4) with the Postgres JDBC 9.4.1212 version installed.
Some months ago we started running into an issue where all of the allowed connections (configured to 200) to our postgres instance were being consumed. When you view those connections in postgres nearly every one of them show that they are idle, and had been idle for up to a day. We have been unable to determine what might be causing the connections to hang and become idle. As a band aid we set up a scheduled even to kill any connection that had been idle for longer than 1 hour. This has helped to avoid maxing out the connections pretty well, however occasionally that has not been enough when our websites have been under high load. We are not using any ORM, just straight cfquery in our applications. Our connection timeout settings have been at 1 minute. We have adjusted that to 0 from time to time to see if that would help, but it doesn’t appear to change anything.

Any ideas, or suggestions?

Zackster · February 14, 2019, 5:06pm

have you tried updating to the latest 5.2.9 stable release?

rshake · February 14, 2019, 5:44pm

Just updated to 5.2.9.31. Going to see if that helps. Thanks.

Markus_Wollny · February 19, 2019, 1:52pm

Hi,

Just for what it’s worth: We’re on Lucee 5.2.9.31 using org.postgresql.jdbc42 with bundleVersion 9.4.1212 and PostgreSQL 10.6. We have set a sensible connection limit in the datasource configuration and a connection timeout of 1 minute. We’re seeing a couple of hundred JDBC queries every second on average and a fair amount of this is going to our PostgreSQL database. We’re also using MySQL and MongoDB, so even with FusionReactor running it’s not so easy to break down the activity to driver types, but PostgreSQL is or main content database, so I assume that it’s something around 300 queries per second on average for each of our Lucee servers. The DB server is hardly ever going over anything more than 200 concurrent connections. The DB server is getting hit by non-CFML code, so this is seeing some 2.5-3k transactions/second.

We haven’t seen the behaviour you describe at all, the servers are running fairly smooth for over a week (at the moment some 580M total JDBC queries in the last 11 days) - and we’re restarting our instances due to updates at least every other week, so if this should only happen with longer updates, we probably wouldn’t notice.

It may in fact be not related to some bug in Lucee at all - it could also be a side effect of a Denial of Service attack. If there were some activity that would spin up a lot of (preferrably slow) requests (google slowLoris/slowPOST), Lucee may spin up the equal amount of JDBC connections for the pool. They may only be closed once a connection is idle for more than the connection timeout you have set in the datasource configuration, as far as I understand. Brute force remedy would be to set the timeout to 0, so the connection would be freed after use. A more workable solution might be some sort of WAF, mod_security or something like that.

You could also try and dig into the info PostgreSQL provides to find out what exactly was running last when the connection limit is reached (assuming you can still run queries with the superuser), for example with a query like this:

SELECT datname AS DATABASE, 
pid AS backend_pid , 
client_addr::text as client, 
extract(epoch FROM (now()-query_start)) AS runtime, 
query AS statement 
FROM pg_stat_activity 
-- WHERE NOT (query LIKE '<IDLE>%') AND state <> 'idle' 
WHERE state = 'idle'
ORDER BY runtime DESC;

Also, FusionReactor may provide some insights into what’s going on. I hope that you find some angle to solve your problem.

Kind regards

Markus