Dino, I’m afraid I don’t have a ready solution for your problem, though we should be able to get to one–and I have one guess for you, based on what you’ve shared. More on that in a moment.
1] First, I do want to correct what seems a misconception from your last comment (as you struggle to find SOME explanation for the problem). Those log lines don’t reflect “a runaway thread”, nor does it imply Lucee necessarily “shutting down” on its own.
What I’m saying is that those lines (about possible “memory leaks”) appear pretty much WHENEVER one shuts down Lucee…and even CF. To prove the point, I literally just started and ended a Lucee service I have installed, and got those very lines:
25-Jul-2024 11:50:59.903 INFO [Thread-33] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8888"]
25-Jul-2024 11:51:00.703 INFO [Thread-33] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["ajp-nio-127.0.0.1-8029"]
25-Jul-2024 11:51:00.710 INFO [Thread-33] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
25-Jul-2024 11:51:05.887 WARNING [Thread-33] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [ROOT] appears to have started a thread named [FelixResolver-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
Note that was ME manually stopping the service…and which I had just started. And not a single request had been run against it. (I realize someone else may chime in and say, “no, I NEVER see such lines in my catalina logs”.) I’m simply saying I find them to be VERY common (and on CF just as readily as Lucee).
And FWIW they’re triggered by a listener that tomcat enables by default, in the server.xml, called JreMemoryLeakPreventionListener. And it seems that when a shutdown starts and it immediately detects a thread it couldn’t stop, it reports this…but since Tomcat is shutting down, it’s a red herring.
If it’s of any value, you can find log lines in other files reflecting that the service is being stopped (and I’d think manually, rather than some automatic “shutdown”), which I can see such as in my commons-daemon log for today (in the same folder as the catalina logs), commons-daemon.2024-07-25.log, reporting:
[2024-07-25 11:50:59] [info] [10732] Stopping service...
[2024-07-25 11:51:06] [info] [10732] Service stop thread completed.
[2024-07-25 11:51:07] [info] [27424] Run service finished.
[2024-07-25 11:51:08] [info] [27424] Apache Commons Daemon procrun finished.
2] BTW, if you feel confident that YOU are not (and no one is) manually bringing Lucee down, here’s one possibility that may not be reflected in the Lucee/Tomcat logs: since you’re running on Linux, it COULD be the Linux “out of memory killer” mechanism that is detecting your box running low on memory…in which case it (the OS) chooses to terminate the process using the most (whether it’s the true CAUSE of the OS memory limit being reached).
You can search to find more on that. But as for finding the Linux logs that will help you know if that IS happening, check out this SO discussion.
Maybe that’s what’s happening. And in that case of course the next question would be “why is memory on the box reaching the limit”. And it could be Lucee contributing to it, or something else (some people run a db on the same machine as Lucee or CF, and that could be the main contributor).
And this OOM killer problem would also beg the question: how much memory is on the machine? Or might it be a vm? or a container? or a commandbox server? And how much heap is Lucee/Tomcat configured to use, etc. As I’ll discuss next, some monitoring tools can help you get to the bottom of the USE of memory within Lucee/CF.
3] Otherwise, perhaps none of the above answers your main question: why is Apache reporting 503 errors, or why do you see that Apache can’t reach Lucee via the attempt to proxy to 8888? (which suggests you’ve opted not to use the AJP connector)
In that case, I’ll say first that there can always be any of many reasons that Lucee (or CF) is not responsive to a request passed to it from a web server like Apache (or IIS). And sadly the logs (in Lucee or CF and tomcat) often fail to explain the reason. It could even be that they are indeed “running” at the time…but just not being “responsive” to the request passed in from the web server.
And in most such cases, your best next step will usually be to implement some Lucee- (or CF-) specific monitoring to better understand WHY it may be that cfml requests are not responding in a moment. That could be FusionReactor or JVM tools (or the PMT in the case of CF).
Again, if the CFML engine is UP but not processing new CF requests, don’t presume such a monitor “can’t help” understand what’s going on in a moment. And some of the tools can be tracking things over time and could be useful even AFTER the CFML engine is restarted, to know what was going on BEFORE that. That may be most important in your case, whether Lucee is going down on its own or someone/something is restarting it.
Such monitoring tools vary in how easily they are implemented and how they work (as well as cost, with many being free and even FR having a 14-day free trial). I realize you didn’t want to hear “you may need to enable monitoring”, and you may not want to “dig through such stuff”.
Maybe someone else will offer some magic pill that happens to be the right antidote for you. My experience is that since there are so many things that can go amiss, the most valuable use of time is using such tools to find the problem and solve it. I appreciate that when one is new to any such tools (even free JVM tools), it can be a hassle figuring how to set them up, use them, and connect the dots (of what they may or may not say) to your actual problem and its resolution.
4] That’s where having someone experienced with the tools help you implement and consider them. I do that, but sadly I can’t do it for free (as I’m busy enough doing it on a consulting basis for folks). I am leaving all the above for you to consider, and I hope it may help (or at least point you in a better direction than your last post was headed). And of course I’m happy to answer questions or offer more as may seem appropriate here.
But if you remain stuck and “just need this solved”, I offer such remote screen-share consulting help, and we may find and resolve things in less than an hour or two. More on my rates, approach, satisfaction guarantee, and online calendar at Charlie Arehart's Server Troubleshooting Consulting Services. There’s also support from others here, including Rasia (which includes the team behind Lucee).
Sorry for the long-winded answer, but since you’re stuck and no one else has replied, I thought it better to offer what I could, and well, wordy is “how I roll”. If such things were easy, smart folks like most here wouldn’t need help, or would get ready answers in places like this. Some challenges just aren’t that easy…though once we find the cause it may well be simple to resolve.