Connection Refused - Tomcat down

I’m looking for help on how to prevent Lucee/Tomcat from crashing every 45
to 90 minutes. I need to get this Lucee server stabilized. It may be that I
just need to do some configuration changes - bump up memory and the like
but I’m a Lucee neophyte.

FYI - I built a new Lucee server on an AWS EC2 instance running Amazon
Linux. It worked flawlessly in development. We did testing and even some
load testing (though apparently not enough). Today we took it live and
after a bit started getting “Connection Refused” messages from the website.
It took me a bit but I figured out that the web server was down so I SSH’d
into the server and restarted Lucee/TC and in 20 seconds it was back up.
Unfortunately after 45 to 90 mins it would crash again.

In the afternoon after a crash I patched Lucee and that held for a couple
hours but I think that just might have been because users lost confidence
and the load was lessened. It has just crashed again.

I’ve attached the log file from one of the crashes.

Thanks in advance.

A few things that jump out at me from a quick look at the log you sent through:

  1. You’ve got a 4GB server, but you’re running Lucee with a maximum heap of 512MB. I’d adjust your -Xmx argument up to at least 2GB. 512MB is wayyyy to small, IMHO, for any kind of production server.
  2. I’m seeing entries that make it look like you are serving static assets with Tomcat (or at least some are). If so, you should really be serving those from your web server - especially if you’re using SSL. A quick change to your Apache or Nginx config will fix that.

The two above, alone, would make a huge impact. A few other suggestions:

  • Check all of your log file directories for correct permissions. I/O issues with directory and file permissions can contribute to these kinds of issues as threads wait around trying to do things they aren’t allowed to do.

  • Set your request timeout lower than the default 50 seconds - I’d start at 20 - unless you absolutely need it to be that high (e.g. file uploads, etc.). This way you can kill some of those threads faster if they start hanging, as your logs show they are.

  • Check a dump of your application scope (with a top argument of 3 or 4) when things start to slow down. It’s possible there may be issues that are contribution to (literal) scope creep that are brought on by the production traffic from different clients.

HTH,

JonOn December 17, 2015 at 7:52:16 PM, Michael Wood (@Michael_Wood) wrote:

I’m looking for help on how to prevent Lucee/Tomcat from crashing every 45 to 90 minutes. I need to get this Lucee server stabilized. It may be that I just need to do some configuration changes - bump up memory and the like but I’m a Lucee neophyte.

FYI - I built a new Lucee server on an AWS EC2 instance running Amazon Linux. It worked flawlessly in development. We did testing and even some load testing (though apparently not enough). Today we took it live and after a bit started getting “Connection Refused” messages from the website. It took me a bit but I figured out that the web server was down so I SSH’d into the server and restarted Lucee/TC and in 20 seconds it was back up. Unfortunately after 45 to 90 mins it would crash again.

In the afternoon after a crash I patched Lucee and that held for a couple hours but I think that just might have been because users lost confidence and the load was lessened. It has just crashed again.

I’ve attached the log file from one of the crashes.

Thanks in advance.

Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/1eb1c341-b1b3-4ee7-86f4-ed6e0939c6a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The JVM arguments are set in /opt/lucee/tomcat/bin/setenv.sh like so:

CATALINA_OPTS="-Xms1024m -Xmx2048m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -javaagent:lib/lucee-inst.jar";
With regard to the Lucee process, Tomcat optimizations are Lucee optimizations. With regard to request settings and other settings handled through the admin, those optimizations are Lucee-specific.

Now that you describe the purpose of this server, I can see why you are dealing with long running threads. Memory will help this, and you’re going to need as much as possible to deal with the image manipulation and resizing.

Are you using s3fs (https://github.com/s3fs-fuse/s3fs-fuse ) to mount the Amazon S3 buckets as part of the local file system or are you transmitting them to AWS? I think you’ll find the former offers a significant increase in speed, though it’s not without its quirks.

Even if you’re serving your images from AWS, you’re still going to decrease your load on the server by using a web server to serve your static JS and CSS files. Just because Tomcat can serve them, doesn’t necessarily mean it should and, ideally, you want to minimize the number of requests it has to deal with so that it can focus on the real heavy lifting it needs to do.On December 17, 2015 at 9:54:40 PM, Michael Wood (@Michael_Wood) wrote:

Thanks for these tips!

I’d love to adjust my -Xmx argument up but where exactly can I do that? I’ve been searching the web and the Lucee docs but I think I’m missing something obvious. Can you point me at something that talks about optimizations. And are we talking Lucee optimizations or Tomcat optimizations?

FYI - I’m actually just using this server very specifically as a place to upload image files through Lucee so that I can control what is uploaded, can resize/modify the images, and can then I push them over to S3. It takes the load of uploading and image processing off or our main website which has made a huge performance difference on the main web server.

Since what the server is being used for is so specific (not for web browsing) I didn’t install Apache; it’s just lucee and Tomcat. Does that make sense?

On Thursday, December 17, 2015 at 5:18:59 PM UTC-8, Jon Clausen wrote:
A few things that jump out at me from a quick look at the log you sent through:

  1. You’ve got a 4GB server, but you’re running Lucee with a maximum heap of 512MB. I’d adjust your -Xmx argument up to at least 2GB. 512MB is wayyyy to small, IMHO, for any kind of production server.
  2. I’m seeing entries that make it look like you are serving static assets with Tomcat (or at least some are). If so, you should really be serving those from your web server - especially if you’re using SSL. A quick change to your Apache or Nginx config will fix that.

The two above, alone, would make a huge impact. A few other suggestions:

  • Check all of your log file directories for correct permissions. I/O issues with directory and file permissions can contribute to these kinds of issues as threads wait around trying to do things they aren’t allowed to do.

  • Set your request timeout lower than the default 50 seconds - I’d start at 20 - unless you absolutely need it to be that high (e.g. file uploads, etc.). This way you can kill some of those threads faster if they start hanging, as your logs show they are.

  • Check a dump of your application scope (with a top argument of 3 or 4) when things start to slow down. It’s possible there may be issues that are contribution to (literal) scope creep that are brought on by the production traffic from different clients.

HTH,

Jon

On December 17, 2015 at 7:52:16 PM, Michael Wood (mw...@stpo.com) wrote:

I’m looking for help on how to prevent Lucee/Tomcat from crashing every 45 to 90 minutes. I need to get this Lucee server stabilized. It may be that I just need to do some configuration changes - bump up memory and the like but I’m a Lucee neophyte.

FYI - I built a new Lucee server on an AWS EC2 instance running Amazon Linux. It worked flawlessly in development. We did testing and even some load testing (though apparently not enough). Today we took it live and after a bit started getting “Connection Refused” messages from the website. It took me a bit but I figured out that the web server was down so I SSH’d into the server and restarted Lucee/TC and in 20 seconds it was back up. Unfortunately after 45 to 90 mins it would crash again.

In the afternoon after a crash I patched Lucee and that held for a couple hours but I think that just might have been because users lost confidence and the load was lessened. It has just crashed again.

I’ve attached the log file from one of the crashes.

Thanks in advance.

Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/1eb1c341-b1b3-4ee7-86f4-ed6e0939c6a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/dcbae006-b8dc-4221-8baf-e2db2f47f071%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.