Lucee's NTP Client Does Not Respect Certain Parts of the Specification

Posting here before logging a defect as requested on the Jira page.

There are various issues in Lucee’s handling of standard-compliant, but not happy-path responses from NTP servers when “use time server” is checked in the “Regional” settings (as is the default). This leads to vastly incorrect dates being returned by Lucee. This issue exists from Lucee 4 - 6 at least.

Below are some examples of how Lucee mishandles NTP server responses. No matter what, if Lucee receives a parse-able response from the NTP server it will use the offset (/core/src/main/java/lucee/runtime/net/ntp/NtpClient.java) given, even if it is obviously incorrect (and the NTP server is trying to communicate not to use the data).

NTP messages have various effective headers in their responses. The two of reference here are the stratum indicating what the source of the time information is and the leap indicator indicating if a leap second will be happening in the next day.

While the original NTP specification said that having a stratum of 0 had unspecified behavior (reference (rfc2030#section-4)) the updated RFC (rfc4330) specifies that a stratum of 0 means it is a “kiss-of-death” message defined later (rfc4330#section-8) in the document. One of the common usages of a kiss-of-death response is as a rate limit.

Simply put, a stratum of 0 should not be processed as a legitimate response.

Of note, as discussed here(https://www.ntp.org/documentation/4.2.8-series/rate/) “In order to make sure the client notices the KoD [Kiss of Death] packet, the server sets the receive and transmit timestamps to the transmit timestamp of the client packet. Thus, even if the client ignores all except the timestamps, it cannot do any useful time computations.”

What this means for Lucee is that it sends the NTP server the time it thinks it is, the NTP server responds with a kiss of death including an offset identical to the timestamp sent which Lucee adds to System.currentTimeMillis() which leads to Lucee believing the time is basically 1/1/1900. This can cause huge data integrity issues.

Another condition ignored is when the leap indicator is 3 which indicates an alarm condition where the clock is not synchronized (reference(rfc4330#section-4))

“The most
important indicator of an unhealthy server is the LI [leap indicator] field, in which
a value of 3 indicates an unsynchronized condition. When this value
is displayed, clients should discard the server message, regardless
of the contents of other fields.” (rfc2030#section-6))

I implemented an example NTP server (GitHub - kylec32/RateLimitedNtp: Example repo that always responds with a rate limited response to NTP requests) that always responds with a rate limit response. You can run it and point to it in Lucee, and after a restart, you will see that <cfdump var="#now()#"/> returns as a very early 1/1/1900 timestamp.

I tried to be more helpful with links to the appropriate section but am limited to two links as a new user.

2 Likes

@kylec32 thanks for the great post, very informative!
We will simply remove that feature from Lucee with no replacement, what is planned for a long time anyway. There is no need for that anymore in todays server environment.

Everyone should make sure that they have this feature disabled in the Lucee admin!

1 Like

This is what we using as our solution as well so I fully support that. That will resolve this too so two for one. Issue navigator - Lucee

@kylec32 Interesting!! I have had two servers this week suddenly set the date to 1900-01-01 for no apparent reasons. I suspect that this will have been the cause, so a timely post - thank you.

We just got bitten by this - Well - we just realised we got bitten by this.
For us it seemed to fall apart at ~ 0800 04 July.

Now - we just have to go thorough all of our data since then - and see if we can correct / or massage the shit-fight that has occurred!

We encountered something similar, except our date is 12/31/1899, not 1/1/1900. If we disable use of the NTP time server, wouldn’t that cause time drift and therefore just another set of time related problems?

We are currently using pool.ntp.org for NTP. For those of you reporting the issue, are you using that too? Would it be better to keep the time server enabled and switch to a different server like NIST? Or would we have the same issues when it returns KoD?

Thank you kylec32 for the detailed info!

If you disable the time server it will use the time the server running Lucee has. Your server is going to already be keeping its time updated with NTP. With how it is currently developed it is using NTP to correct a time that is already kept up to date with NTP (however using a incomplete implementation of the protocol).

We were using pool.ntp.org and briefly used Amazon’s NTP service (since we were running on EC2) which did seem to work better. That said I would still encourage you just to disable the time server functionality.

Very good. We will take your advice. Thank you!