Lucee 5.2.8.50 came to a crawl

rrhodescf · August 26, 2018, 12:57pm

Hello to All,

I have 35 sites running on Lucee 5.2.8.50, with Java 1.8.0_181, and Tomcat 8.5.32.

These sites are running windows 2016 server standard x64. This server has 32GB Ram. The traffic on these servers is moderate.

I am running the database (Microsoft SQL server 13.0.4.223.10) on another server, not on Lucees’s server.

So… something strange happened yesterday – twice.

I was in the Admin, trying to add datasources, and on a few of them, so I was fumbling around a bit because I had the passwords wrong.

While trying to sort that out, Lucee suddenly bogged waaaaaay down, and server’s CPU went through the roof. The server came to a slow crawl.

I finally got the Lucee Properties box open and clicked Stop but lucee was so bogged down, the service would not stop.

And the server was so bogged down that it took me several minutes to restart the server. Each event worked out to be about 15 minutes of down-time.

Any idea on what might cause this, and how I can avoid this?

My settings are:

-Dcatalina.home=C:\lucee\tomcat
-Dcatalina.base=C:\lucee\tomcat
-Djava.endorsed.dirs=C:\lucee\tomcat\endorsed
-Djava.io.tmpdir=C:\lucee\tomcat\temp
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=C:\lucee\tomcat\conf\logging.properties
-Dfile.encoding=utf8

Initial Memory Pool: 2000

Maximum Memory Pool: 5000

The Thread Stack Size is blank.

Any help would be much appreciated.

Thanks,

-RR

justincarter · August 27, 2018, 11:22am

If we assume that the server is only running Lucee and IIS, and that Tomcat was definitely using all the the available CPU resources and not some other process, then typical reasons for high CPU usage could be;

a request came in that executed some code which required a huge amount of CPU time
some code could be causing memory leaks, which over time uses up available memory, which in turn causes frequent and long pauses for JVM garbage collection
many hundreds/thousands of requests came in which all required a small to moderate amount of CPU time, but on aggregate were too much for the server to handle

If you have IIS logs enabled you might be able to analyse them to see if the last point occurred and you had a flood of traffic on your server.

If that wasn’t the case then tracking down the culprit can be more difficult because you need some insight / data to work with. Tools like FusionReactor are incredibly helpful when you just don’t know where to start looking, because you can monitor the JVM metrics (CPU, RAM, garbage collections), you can alert if CPU usage is high, and you can see long running requests or how do your request queue is.

Depending on what you find there could be any number of different solutions to your problem(s). Good luck

justincarter · August 27, 2018, 11:25am

To touch on this point directly, I’m not aware of any current issues related to the Lucee admin and CPU usage specifically. Perhaps @micstriit might have some advice.

Jordan_Michaels · August 27, 2018, 5:07pm

Did you look at your server’s processes at all? Do you know what process was taking all of your CPU? You just said CPU was through the roof, not what process was eating it.

dennis · August 27, 2018, 5:10pm

For troubleshooting CPU usage issues I rely on Process Explorer.

It is a staple utility I run on all of our Servers and has helped me track down more issues that I can count. At a quick glance you can see what process or subprocess is consuming memory and or CPU time.

SysInternals was purchased by Microsoft but their utilities are a godsend for Windows troubleshooting.

rrhodescf · September 8, 2018, 6:45pm

So, it happened again.

I deleted a datasource and then Lucee slowed waaaaay down, and then my sites all went down.

I opened the Lucee Properties box, and then clicked stop, but the service did not stop. I waited for 5 minutes, and still, the service did not stop. So I rebooted the server.

The one thing I do remember was odd is that I had two datasources, with different names, pointing to the same database. So I deleted one of the datasources. That’s when Lucee started bogging down.

And this event reminds me that that the last time this happened, I also had two datasources, pointing to the same database, and I deleted one of them.

I am running Lucee 5.2.8.50, tomcat Tomcat 8.5.32, and java 1.8.0_181 on IIS 10, and the server has 32gb of ram. I am using jTDS 1.3.1 (JDBC 3.0).

When this happened, I meant to open up the task manager to see what was using all the resources, but I was very intent on getting the sites back up.

The next time it happens I will try to remember to open the task manager and see what is going on there.

Zackster · September 8, 2018, 8:09pm

Check your logs? Was the data source in use?

Lucee logs out full stacktraces for errors which can really be verbose and might be hammering your servers disk