Improved controller execution logging, log when a step takes longer than a threshold

Firstly, what’s the controller thread? It’s a background process in Lucee which

  • cleans up temp files
  • validates pooled jdbc and http connections
  • expires sessions and applications
  • polls the deploy directory
  • and a range of other things

But I got tired of being asked about long running controller threads, and not being able to provide any concrete answers on how to figure out what was taking so long…

So I’ve just added in some additional logging of how long each step takes

https://luceeserver.atlassian.net/browse/LDEV-5670

By default, when any step exceeds the default 20s threshold it will be logged out to the application.log, when the log level is set to INFO.

There is a new env var / sys prop since 6.2.2.62-SNAPSHOT / 7.0.0.270-SNAPSHOT which allows configuring the logging threshold in seconds for long running steps.

LUCEE_CONTROLLER_LOG_THRESHOLD / lucee.controller.log.threshold

Default is 20, set to 0 to disable this logging

Here’s an example of the threshold at 1ms (manually set via code to demonstrate, the setting only accepts integers so normally the minimum threshold is 1000ms)

-------------- Start Tests -----------
"INFO","ControllerThread:1","06/26/2025","14:36:03","lucee.runtime.engine.Controler",
   "checkTempDirectorySize took 94ms"
"INFO","ControllerThread:1","06/26/2025","14:36:03","lucee.runtime.engine.Controler",
   "clearUnused took 31ms"
"INFO","ControllerThread:1","06/26/2025","14:36:03","lucee.runtime.engine.Controler",
   "clearUnusedDBConnections took 28ms"
"INFO","ControllerThread:1","06/26/2025","14:36:03","lucee.runtime.engine.Controler",
   "clearUnusedHttpConnections took 5ms"
"INFO","ControllerThread:1","06/26/2025","14:36:03","lucee.runtime.engine.Controler",
   "Web Context [D:\work\lucee6\ant] took 164ms"

It’s normal for these processes to take some time, proportional to the amount of connections, temp files, onSessionEnd(), onApplicationEnd() etc which are being processed in the background.

A server under high load may take longer to run the background controller thread as it runs at a lower thread priority than normal requests

5 Likes

@thefalken moving the conversation back here as it’s going to be lost in the RC thread

Are you seeing any of the above INFO entries ?

Nope; The lines right before the latest one are our normal app logging. Including one at INFO level.

"INFO","http-nio-8888-exec-14","07/07/2025","14:12:47","http://x.staging.y.co.uk:80|x.staging.y.co.uk","Secure : getFieldsForUuids @ 1751897567261 6 fname,hospital,sname,ward"
"WARN","http-nio-8888-exec-14","07/07/2025","14:12:47","http://x.staging.y.co.uk:80|x.staging.y.co.uk","waiting-list : referral 126 using fallback for reason"
"WARN","http-nio-8888-exec-14","07/07/2025","14:12:47","http://x.staging.y.co.uk:80|x.staging.y.co.uk","waiting-list : referral 223 using fallback for reason"
"ERROR","Controller","07/07/2025","14:14:59","controler","was not able to stop controller thread running for 50020ms;java.lang.Throwable;java.lang.Throwable
	at java.base/java.io.UnixFileSystem.list0(Native Method)
	at java.base/java.io.UnixFileSystem.list(Unknown Source)
	at java.base/java.io.File.normalizedList(Unknown Source)
	at java.base/java.io.File.list(Unknown Source)
	at lucee.commons.io.res.type.file.FileResource.list(FileResource.java:701)
	at lucee.commons.io.res.type.file.FileResource.listResources(FileResource.java:178)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1404)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1407)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1407)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1407)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1407)
	at lucee.commons.io.res.util.ResourceUtil.deleteFileOlderThan(ResourceUtil.java:1407)
	at lucee.transformer.dynamic.DynamicClassLoader.cleanup(DynamicClassLoader.java:311)
	at lucee.transformer.dynamic.DynamicInvoker.cleanup(DynamicInvoker.java:396)
	at lucee.runtime.engine.Controler.control(Controler.java:537)
	at lucee.runtime.engine.Controler.control(Controler.java:285)
	at lucee.runtime.engine.Controler$ControlerThread.run(Controler.java:126)

So unless they’re not going into the global application.log not sure why would be missing…

Edit: I logged directly into one of our clusters, and in /opt/lucee/server/lucee-server/context/logs ran grep 'ControllerThread' * and it only finds things at ERROR, none from today, all in requesttimeout.log, latest one is

requesttimeout.log:"ERROR","ControllerThread:27001","07/06/2025","06:46:10","controller","stop request (3) because run into a timeout. ATM we have 0 active request(s) and 0 active cfthreads (no path available)."

Edit2: while there I double checked the Lucee version, seems like should have the changes in ?

ls server/lucee-server/patches
6.2.2.66-RC.lco