Lucee instances crashing on Windows EC2 when running multiple CommandBox servers (JVM memory pressure)

Hello everyone,

I’m facing stability issues while running multiple Lucee applications using CommandBox on a Windows EC2 instance, and I’d like guidance on best practices.

Environment

  • OS: Windows EC2
  • RAM: 16 GB (tested also with higher RAM)
  • Lucee running via CommandBox
  • Total applications: 6
  • Setup: One Lucee instance per application (6 JVMs)
  • IIS used as a reverse proxy
  • PostgreSQL + PgAdmin running on the same EC2
  • Development environment (very low traffic: ~1000–1300 requests/day)

Issue

Even after explicitly configuring JVM heap sizes via server.json (using -Xms / -Xmx), the java.exe processes are being terminated automatically by Windows under memory pressure. This happens intermittently, especially during restarts or when multiple Lucee instances start together.

Increasing RAM helps temporarily, but the issue reappears. This makes the setup unreliable for development.

What we’ve tried

  • Explicit JVM heap configuration per app (avoiding auto heap)
  • Reducing heap sizes
  • Increasing EC2 RAM
  • Ensuring page file is enabled
  • Staggered server startup

Despite this, crashes still occur when running multiple JVMs on Windows.

Question

What is the recommended approach for running multiple Lucee applications on a single Windows EC2 instance using CommandBox?

Which exact version of Lucee / Java are you running?

Postgres and 6 instances with only 16GB of ram, that’s really not a lot of headroom, given the OS also uses RAM.

How much memory does Lucee use after you just hit the application once and it has loaded up?

What heap sizes have you tried?

@Zackster i have split the servers up in the above mentioned, running only 1 instance of Lucee, postgress has been removed, etc… it is still crashing, here is my out.text file It was running fine for 2 years, this is our dev environment…

Also, @Zackster and @bdw429s for our production environment, we are thinking of running aws t2.medium servers with 4gb, but have them elastic so if we need more, more spin up. What is your thoughts on that size instance? It will only be running 1 site

server.out.txt (1.2 MB)

Also, started --verbose and see the following. I do not want to have to set up our entire dev development again, they were working fine until recently…

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by lucee.commons.lang.ClassUtil (c:\Users\Administrator.CommandBox\server\50B1730A201E78722018ECD6A7AD5026-admin\lucee-5.4.6.9\WEB-INF\lucee-server\patches\5.4.6.9.lco) to constructor com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl()
WARNING: Please consider reporting this to the maintainers of lucee.commons.lang.ClassUtil

That’s just a warning, not an error. It has nothing to do with memory and does not affect your server’s ability to run.

As for the log file you posted, I see no memory-related issues in there. Again, I asked this before but what proof do you have of this “memory pressure” you speak of? Are there memory-related errors? Did you see something in this log which make you think there were memory issues, or is this just some random log file?

Also, look at the ColdBox warning and refactor your renderView() calls so those warnings will stop. They prolly have nothing to do with memory, but are filling 90% of your log files.

renderview() has been deprecated, please update your code to view()

@bdw429s thank you and will do. I am not sure what happened with the server, but going to spin up a new one and stop trying to deal with troubleshooting… quicker to do this.

Wait, so now you’re not sure if you were even having memory issues at all? I’m still confused what initially caused you to say there was a memory issue? Also, I asked additional follow-up questions in the Slack thread that I’m waiting for you to answer.

@bdw429s That is my development team, i am stepping in to investigate as well. I will converse with them what you asked in the slack channel and get back to you. They initially thought it was memory because they increased the heap size and edited JVM variables, everything started crashing. They removed all the variables and all the sites stop unexpectedly, this is why they thought it was memory issue. I deleted the felix folder thinking it was corrupted cache but that did not fix it either. It is crashing on a machine that had 32gb of memory and only 8 was being used… so really not sure what caused everything to go haywire. One last item to note. I made an AMI of the server, launched 6 separate ones, one for each application, so only 1 application is running on each server, still stopping unexpectedly.

@bdw429s here is the latest log which shows it stopping, does not look like a memory issue, but this has started out of nowhere after running 2 years ok. Please forgive me if my comments are elementary as I am not on the development side,

Undertow: Attempting to authenticate /index.cfm, authentication required: false
Undertow: Authentication outcome was NOT_ATTEMPTED with method io.undertow.security.impl.CachedAuthenticatedSessionMechanism@7d90880a for /index.cfm
Undertow: Contents of exchange after authentication attempt is HttpServerExchange{ GET /index.cfm}
Undertow: Authentication result was ATTEMPTED for /index.cfm
[TRACE] Server Rules: Regex pattern [^/(.+?.cf[cms])(/.)?$] MATCHES input [/index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [0] as [/index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [1] as [index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [2] as [null] for HttpServerExchange{ GET /index.cfm}.
Undertow: Attempting to authenticate /index.cfm, authentication required: false
Undertow: Authentication outcome was NOT_ATTEMPTED with method io.undertow.security.impl.CachedAuthenticatedSessionMechanism@7d90880a for /index.cfm
Undertow: Contents of exchange after authentication attempt is HttpServerExchange{ GET /index.cfm}
Undertow: Authentication result was ATTEMPTED for /index.cfm
[TRACE] Server Rules: Predicate [path-prefix( { ‘/flashservices/gateway’, ‘/messagebroker’, ‘/openamf/gateway’, ‘/cfform-internal’, ‘/CFFormGateway’, ‘/flex2gateway’, ‘/flex-internal’ } )] resolved to false for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Regex pattern [.(mxml|cfswf)$] DOES NOT MATCH input [/index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Predicate [regex( pattern=‘.(mxml|cfswf)$’, value=‘%{RELATIVE_PATH}’, full-match=‘false’, case-sensitive=‘true’ )] resolved to false for HttpServerExchange{ GET /index.cfm}.
Undertow: Matched default handler path /index.cfm
[TRACE] Server Rules: Regex pattern [^/(.+?.cf[cms])(/.
)?$] MATCHES input [/index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [0] as [/index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [1] as [index.cfm] for HttpServerExchange{ GET /index.cfm}.
[TRACE] Server Rules: Storing regex match group [2] as [null] for HttpServerExchange{ GET /index.cfm}.
Undertow: Attempting to authenticate /index.cfm, authentication required: false
Undertow: Authentication outcome was NOT_ATTEMPTED with method io.undertow.security.impl.CachedAuthenticatedSessionMechanism@7d90880a for /index.cfm
Undertow: Contents of exchange after authentication attempt is HttpServerExchange{ GET /index.cfm}
Undertow: Authentication result was ATTEMPTED for /index.cfm
Undertow: Starting to write response for HttpServerExchange{ GET /index.cfm}
Undertow: suspend
Undertow: Opened connection with /0:0:0:0:0:0:0:1:61043
Undertow: No content length or transfer coding, starting next request
Undertow: Matched default handler path /index.cfm

Server’s output stream closed. It’s been stopped elsewhere.

Stopping server…

ERROR (6.2.1+00830)

Server process returned failing exit code [-1073740940]

I don’t see anything wrong here at all, just normal log entries. You do have trace level logging enabled, which is quite a lot of unnecessary stuff.

This just means that the server process exited from outside of CommandBox. This could happen for example when you stop a server via the Windows tray icon. Can you elaborate on how you are running your servers. Are you using Windows services? If so, explain the setup.

Now this is useful info. From my Googles, that exit code allegedly maps to
C0000374 which is Windows’ exception code for heap corruption. If you are running windows as a service using a tool like NSSM, then configuring service logs may reveal more info. Also check for a java hotspot dump in the working directory of the java process in case there was a JVM panic.

@bdw429s so far this is our development environment. Windows 2022 with 4gb of ram, min/max heap set to 2048. It is stopping when there is zero traffic on the application. I mean, literally. Start it up, sometimes it stays running for 15 minutes, other 1 hour, another 2 minutes. I tried increasing the instance size to 16gb, and same thing, just crashes… We use the commandbox line to server start, the tray disappears when the server stops. I purchased 5 hours of consulting from ortus, need this resolved by tomorrow asap.

Can you answer the rest of my questions?

@bdw429s we are not running the forgebox service manager, we will turn that on for our production environment as we will have autoscaling in our webserver pool

We run our services as a stand alone commandbox application. As i said, i moved each application to its own windows server. boncode set to the single site that is running.

Not sure what questions i have not answered? I don’t know what a java panic is or where the working directory of the java process is? As i mentioned, i am on the product side, not the development side so much of this is foreign, all i know is something does not make sense. I launched a completely fresh server, clean installation, not from an image, same issue is happening.

You haven’t clearly answered if you are running the server as a Windows service or not, and if so, how.

Is the JVM equiv of the Window blue screen of death, and when it happens, Java usually writes out a file of debugging information on the state of the JVM when it died.

Probably the dir that box.exe was started in, but that really depends on how you’re running the server. No matter, just search the hard drive for files with hs_err_pid in the name.

@bdw429s we are NOT running it as a windows service. As i said, we start it up using commandbox interface.

Ok, searching for hs_err_pid and blue screen resonates.

as a note, we start the server by opening command box, navigating to the website directory in commandbox, server start
I searched and there are no files with hs_err_pid in anywhere…

Ah, I see. So someone is just manually remoting into the server any time it restarts and manually running box server start in a terminal which they need ned to leave open? I can’t say I’ve heard of that. What happens if Windows logs off that user and kills all their processes?
I’m not understanding your reasoning for not using a service, but no matter-- that answers my question. So then this is what that means-- the logs you posted above, were they copied directly from that open terminal, or just taken out of the server.log.txt file for that CommandBOx server home? I ask because sometimes low level Java errors will be printed to the console but NOT written to the log file because the JVM dies before that can happen. I specifically want to know if there is additional information showing up in the actual terminal where the server was started. If you’re not using the --console flag, then start doing that, so you can get a direct look at the console text as it streams out.

Depends on what you mean by that. How do you run it? That would control the working dir.

Fair enough. In that case, the working dir prolly doesn’t matter. I just wanted to check just in case.

in the development environment this is how we have had it set up. In our other application, production environment it runs as the windows service. I temporarily ran this as a windows service (used forgebox service manager) and it still crashed, but then the tray icon would not appear even after using/setting server set trayEnable=true so i did not know how to get back to the server manager…

Well yes, that’s a limit to Windows services. They don’t run insde your user’s profile and can’t add anything to the tray which is unfortunate. Usually you just have to live without the tray icon for servers you start as a service due to this Windows limitation. The tray icon is great for local dev sites you’re spinning up and down all the time, but for most people, dev/test/prod environments are different in that you simply want the server to come online as soon as the machine boot without any intervention and you just use the windows service UI (or the CLI) to stop/start/restart it.

As I suspect it would. I didn’t ask about running as a service because I thought it would affect your crashing issue, I just asked as a means to understand what options we had to find the console output.

@bdw429s thank you to your team, your ortus team worked with our developers last night; it was a java 11 versus 8 issue. Java 11 was causing instability for whatever reason.