Corruption and ghosts from scheduled tasks

I have been hitting very serious issues with scheduled tasks and the cfschedule function. When my app starts, I clear out all tasks, then re-register them in onRequestEnd in cfthreads that call different modules of my app that have their own tasks. The first issue manifests here: If I join my threads my entire lucee install becomes corrupted. Random components get deleted, I get syntax and language parsing errors that don’t make any sense, and I have to rebuild the whole image.

<cfschedule action="list" result="v.q_schedules" />

<cfloop query="v.q_schedules">
	<cfschedule action="delete" task="#v.q_schedules.task#">
</cfloop>

If I don’t join the threads and just let them finish after the request, the lucee install remains intact until I restart the container, and I randomly get ghost copies of the tasks that are slightly different from what I registered and its not all of them that get a ghost.

Example lucee errors:

The OSGi Bundle with name [esapi.extension] in version [2.2.4.15] for is not available locally [ (/opt/lucee/server/lucee-server/bundles)] or from the update provider [ (https://update.lucee.org)].

ERROR: Failed to download the bundle for [findbugsAnnotations] in version [3.0.1] from [https://update.lucee.org/rest/update/provider/download/findbugsAnnotations/3.0.1/?serverId=a940094252c4a7488ae793c8599315b2&serverSecurityKey=6a3f632d-dff4-497f-b901-541de33dc26b&allowRedirect=true&jv=11.0.22], please download manually and copy to [/opt/lucee/server/lucee-server/bundles]

com/mysql/cj/protocol/a/SqlDateValueEncoder$1

Once lucee is corrupted, I have to add --force-recreate to docker-compose to repair it.

In my dev environment, all tasks register as paused, and that is reflected in the admin UI. But these ghost ones are running, and they can’t be stopped. Deleting all tasks from the admin or using cfschedule doesn’t work. All tasks do disappear from the admin, but these ghost ones persist.

When my app registers its tasks, the .CFConfig.json updates to show them, though they are all missing the paused attribute, but in the admin they do show as paused. If I pause them in the admin, the attribute gets added in the json. If I delete them, they are gone from the json. But regardless of being paused or deleted, if there’s a ghost one it’ll keep trying to execute.

Sometimes I get this error in both the admin and using cfschedule: lucee.runtime.exp.ExpressionException:can't delete schedule task [taskName], task doesn't exist. In the admin I’ve selected clearly visible tasks to delete, and with cfschedule I’m just looping through the query that the list action produced.

Since this is more of a mystery than a bug report, where can I look for more clues? There’s got to be a cache, or maybe a quirk of the request lifecycle’s interaction with cfschedule. The ability to destroy a lucee install is particularly troubling since I don’t think I’m doing anything very interesting.

I’m running the official Docker build, lucee/lucee:6.0.0.585-SNAPSHOT-light, upgraded to 6.0.1.83. I’m running in Single Mode. This is my partial Dockerfile

ENV LUCEE_ADMIN_ENABLED true
ENV LUCEE_SERVER /opt/lucee/server/lucee-server

ADD https://cdn.lucee.org/6.0.1.83.lco "${LUCEE_SERVER}/deploy/6.0.1.83.lco"
ADD https://ext.lucee.org/lucee.admin.extension-1.0.0.5.lex "${LUCEE_SERVER}/deploy/lucee.admin.extension-1.0.0.5.lex"
ADD https://ext.lucee.org/ehcache-extension-2.10.0.37-SNAPSHOT.lex "${LUCEE_SERVER}/deploy/ehcache-extension-2.10.0.37-SNAPSHOT.lex"
ADD https://ext.lucee.org/esapi-extension-2.2.4.15.lex "${LUCEE_SERVER}/deploy/esapi-extension-2.2.4.15.lex"
ADD https://ext.lucee.org/lucee.image.extension-2.0.0.26-RC.lex "${LUCEE_SERVER}/deploy/lucee.image.extension-2.0.0.26-RC.lex"
ADD https://ext.lucee.org/com.mysql.cj-8.1.0.lex "${LUCEE_SERVER}/deploy/com.mysql.cj-8.1.0.lex"
ADD https://ext.lucee.org/org.postgresql.jdbc-42.6.0.lex "${LUCEE_SERVER}/deploy/org.postgresql.jdbc-42.6.0.lex"

And my dev docker-compose

version: "3.8"

services:
  lucee:
    build:
      context: .
      dockerfile: Dockerfile-local
    volumes:
      - ./:/var/www
      - ./password.txt:/opt/lucee/server/lucee-server/context/password.txt
      - ./LuceeSettings/.CFConfig.json:/opt/lucee/server/lucee-server/context/.CFConfig.json
      - ./LuceeSettings/server.xml:/usr/local/tomcat/conf/server.xml
    restart: always
    ports:
    - "80:80"

Maybe I can focus the question to keep trying to troubleshoot: What is the result of cfschedule action="update", besides an entry in either /opt/lucee/server/lucee-server/.CFConfig.json in single mode or /WEB-INF/lucee/.CFConfig.json in multi mode? I’ve tried both single and multi mode and I get this same issue. Something is trying to execute a task that either doesn’t exist in those files, or was created in a paused state and the admin UI shows it as such (though the entry in .CFConfig is always missing the “paused” attribute. Maybe those are separate issues, so stick with the first question: where is the actual scheduled task execution happening and what is its source of tasks?