We’ve been having a strange issue with our production servers ever since we moved from Railo over to Lucee. The problem is intermittent, and we can’t quite nail down what exactly is causing it. I have been looking into it and can recognize the symptoms that lead to it crashing, but I’m not sure how to go about troubleshooting it and was looking for some ideas.
The server will, after some period of time, sometimes become unresponsive to web requests. They will hang until they expire. Once this starts happening, no further requests will be accepted.
Using Fusion Reactor, I have noticed that servers which develop this issue have a high number of threads showing under Resources > Threads. Specifically, a high number of ajp-nio-8009-exec-XXX threads seems to precede the issue. I observed a server earlier today which had ~150 of these threads in the TIMED_WAIT state. Soon after that it crashed. After the crash and the requests have expired, these threads all get set to WAITING. (Example capture from FR below)
We aren’t sure if this is a problem with Lucee, Tomcat, the Boncode connector, or what. IIS is still responsive. It will serve up static html/txt files just fine. FusionReactor can still pull data from the server, so it seems like Lucee is still running fine. This leads us to believe the issue lies with Boncode and/or Tomcat, but we just don’t know how to go about testing/troubleshooting.
Mainly we’re looking for ideas for things we can look at.
We also have some Railo servers in production currently, and don’t have this problem. It’s Railo 4.1 which uses the older Java 7 and Tomcat version, so maybe that’s why. I also noticed that the threads on those servers are named differently. (ajp-bio instead of ajp-nio – probably irrelevant but who knows)
I’m happy to provide any other details that are needed, but I wasn’t sure what info would be helpful.