Again, you’re focused on what your log says–from the request having been run. We await your getting evidence (such as via fr) of whether a) it’s BEING called from lucee, or indeed if b) FR shows it TRYING to run when called–which is a separate point–such as if it tries but fails to run.
I appreciate how someone in your situation may want to focus more on what’s broken about Lucee, but we don’t yet know where the problem is.
OK…so this log that I’m talking about is from the new test task that I added last friday…the only thing it does is log a message and exit…
I figured if the allcall schedule page stopped running but this one kept running, I could add a cfhtml call there that could log any problems, but apparently both tasks stopped executing.
Anyways, I just found out how to set scheduler.log level to INFO …looks like that’s what I needed to see the client side of the scheduler request…looks like it’s working so far.
Turns out those times match when our VMWare snapshots are scheduled…
We just tried a manual snapshot and …BAM, the system clock jumped ahead 8 hours! …actually, 7 hours, 56 minutes, 49 seconds…The logs look exactly like what we’ve been seeing.
No idea why yet.
Also, don’t know when the clock is getting corrected. I’m not seeing any corresponding backwards time jumps in the logs when ntp server fixes the system time, except for when I ran it manually, just now…
…also doesn’t explain why it would work with Windows task scheduler but not Lucee’s scheduler…
OK, I think I have an explanation for everything now…
The VMWare host clock had the wrong tz set, and no ntp time management (so almost, but not exactly 8 hours fast).
Apparently, running a snapshot syncs the vm time with the host time.
Lucee scheduler continues running every minute, but it thinks it is 8 hours ahead.
Windows eventually checks the ntp time server and corrects the system clock.
Lucee task scheduler looks at our scheduled tasks and checks against the last time they were run. Now it looks like the previous trigger was 8 hours in the future, so it doesn’t run any tasks until 1 minute after the last run time.
Windows task scheduler doesn’t remember/doesn’t care when the task was last triggered…it just keeps running every minute, so once the clock is fixed, the schedule gets back into the correct state.
Great sleuthing there, and glad you solved it. Again, with fr we could have confirmed that indeed lucee was not even TRYING to call the task, which at least would have take a LOT of other possible explanations off the table. Often such troubleshooting is about removing suspects, which allows you to focus on remaining ones–or even to discover new ones you’d never considered. Glad you could get there on your own.
So, I’ve reported the problem (incorrect vm host clock), and the solution (set the damn clock correctly! )
But now I’m getting pushback…many of our clients run their own VM hosts that we don’t control…and apparently telling them to set their clocks correctly is not a reasonable request (???)
@bdw429s What if Lucee’s scheduler could treat task execution timestamps that are set in the future as invalid…basically forcing a reset and treating it as a “first execution”?
Lucee has its own timezone setting. Does it help if Lucee at least has the correct timezone? Lucee also has a time server setting but I’m unclear if it sets the system time or only affects Lucee.
The issue is that the machine hosting the vm might have the wrong time…
Windows and Lucee have the correct time, but it seems that when we run a vm snapshot for backups, it syncs the vm system time with the host time, potentially a very incorrect time (e.g., 8 hours in the future)
This screws up any events whose schedule is based on time of day…fortunately, it happens at 2 am, so nobody notices, and Windows fixes the system clock with a time server within the hour…the next time the scheduled task is run (by lucee), it fixes the scheduled event states (in our app’s “allcall” schedule).
The problem is, for the brief period of time that the system clock was set in the future, Lucee’s scheduler runs the tasks as if that was the correct time, and then, when the clock is corrected, Lucee’s scheduler sees that it last ran this task 8 hours in the future, so it doesn’t try running it again until then…essentially, the scheduler is stalled out for 8 hours, in this case.
I’m suggesting that if Lucee’s scheduler finds that the last time a task was run was in the future, just clear that timestamp and run the task as if it was just starting up (in scheduler.log, it reports “First Execution”)
@bdw429s …and yes, I realize that this is a dumb thing…but it did happen.
And it was very tough to figure out why it wasn’t working because the scheduler’s event timestamps are private, so all I can see from the outside is that the minutes are ticking by but the scheduler isn’t running the “every 1 minute” task.
So it seems to me that just a simple check in the scheduler to make sure we aren’t dealing with timestamps far in the future would be a quick solution…
To reproduce the issue, just set up a Lucee task scheduled to run every minute, then set the system time to some time in the future (say, 10 minutes). Then, wait at least 1 minute, until the task is executed once, then set the clock back to the correct time. Now Lucee scheduler will not attempt to run the task again until the clock passes the incorrect time we set earlier.
Now do the same thing but “accidentally” set the clock 1 year ahead…