Solved 503 Timeout or Service Temporarily Unavailable

Hello everyone, my Lucee installation runs on a Plesk server with AlmaLinux.
Occasionally, during complex and long-running processes involving thousands of records (even more than 10k), my applications would throw 503 Timeout or Service Temporarily Unavailable errors.

Finally, I found a configuration that seems to have resolved the issue.

Obviously, many of the suggestions come directly from ChatGPT.

Do you have any other recommendations to improve Lucee’s performance?
This could serve as a mini guide.
Thank You.

On Plesk
Additional Apache directives:

ProxyTimeout 600
Timeout 600

ModPagespeed off

Additional Nginx directives:

proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
send_timeout 600s;
proxy_buffering off;

On the server::

/opt/lucee/tomcat/conf/server.xml

Modify the connector:
OLD

NEW

nano /opt/lucee/tomcat/bin/setenv.sh

#!/bin/bash

Memory

export JAVA_OPTS=“-Xms2048m -Xmx4096m”

Metaspace (classi)

export JAVA_OPTS=“$JAVA_OPTS -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=2048m”

TLS

export JAVA_OPTS=“$JAVA_OPTS -Dhttps.protocols=TLSv1.2 -Djdk.tls.client.protocols=TLSv1.2”

Garbage Collector (G1GC)

export JAVA_OPTS=“$JAVA_OPTS -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ParallelRefProcEnabled”

General optimizations

export JAVA_OPTS=“$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -Djava.awt.headless=true -Dfile.encoding=UTF-8”

Log GC (optional, useful for monitoring) – I did not include it

export JAVA_OPTS=“$JAVA_OPTS -Xlog:gc*:file=/opt/lucee/tomcat/logs/gc.log:time,uptime,level,tags -XX:+PrintGCDateStamps”

Lucee Log Rotation
New file:

nano /etc/logrotate.d/lucee

content:

/opt/lucee/tomcat/logs/catalina.out {
daily
rotate 14
compress
missingok
copytruncate
}

Is your batch processing more of a single script or are you using functions etc,
as using functions is going to be way more GC friendly

Aside from increasing the proxy timeout, I’d say increasing the memory is what is making the main difference.

The 503 is more Tomcat saying, I’m busy / stuck, probably in heavy GC

I’d also suggest running any batch stuff in it’s own Lucee instance, I’ll give you a free enterprise license for as many cores as you like :slight_smile:

I’ve been using java flight recorder a lot these days, it’s a really useful way to diagnose performance over time, i.e your batch.

You can create custom profiles (JFC) to choose what aspects of java behaviour to record and then you can get your LLM to query the file using the command line jfr tool (included with the JDK, it’s not in the JRE)

-XX:StartFlightRecording=name=orm-leak,settings=/path/to/orm-leak-diagnostic.jfc,duration=20m,filename=/tmp/orm-leak.jfr

orm-leak-diagnostic.jfc (7.4 KB)

You can also load the JFR into Java Mission Control (JMC) to visualise the memory growth over time, if memory continues to constantly climb during the batch, try refactoring the code to be more GC friendly ( i.e. more stuff into functions )

I prefer JFR over Fusion Reactor, JFR is very light weight (1-2% overhead) and can be run in prod, but it also has some critical robustness improvements in Java 25

3 Likes

I read over 10,000 rows from an Excel file located in a folder external to the server, mounted using s3fs.

I process the Excel rows one by one (validating tax codes, normalizing the address, searching for data in my database, and finally running a couple of queries to insert data into the database).

The actual duration of the loop is about 2 minutes.
Before the loop, several functions are defined that perform calculations.

The Excel file and the table where the data is stored both have about 40 columns.

Using functions?

I would double check the settings you proposed before putting them into production. The TLS setting you have there is completely unrelated and forces all the connections to TLS 1.2, which is quickly becoming dropped in favor of TLS 1.3 (TLS 1.2 is still used today, but will eventually be dropped). The default Lucee 6.x and 7.x installs prefer TLS 1.3 by default.

In those settings, you are telling the server that you don’t want any connections to time out until 10 minutes have gone by… That seems extremely excessive, and could potentially lock up threads for a very long time if anything goes sideways on the server.

My recommendation is that if you have a long-running process – use what @Zackster recommends and run it on another server instance or if you really want to keep it on that instance, bypass the proxy directly by connecting to the port 8888 (or whatever the internal Tomcat servlet is running on). Also make sure to set your CF timeout settings according to your needs.