Silent crash ... process still on

Hi,

from time to time I got a front crash without explanation …

my setup is 2 loadbalancing VM in nginx with keepalive, and 2 others VM
with lucee (apache2/tomcat conf) : lucee 4.5.4.017

right now one of my node is dead, and don’t reply to any page call.

on the VM, I still got the process :
lucee 989 1 1 03:30 ? 00:05:39 /opt/lucee/jdk/jre/bin/java
-Djava.util.logging.config.file=/opt/lucee/tomcat/conf/logging.properties -
Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms256m -
Xmx512m -XX:MaxPermSize=128m -javaagent:lib/lucee-inst.jar -javaagent:/opt/
newrelic/newrelic.jar -Djava.endorsed.dirs=/opt/lucee/tomcat/endorsed -classpath
/opt/lucee/tomcat/bin/bootstrap.jar:/opt/lucee/tomcat/bin/tomcat-juli.jar -
Dcatalina.base=/opt/lucee/tomcat -Dcatalina.home=/opt/lucee/tomcat -Djava.io
.tmpdir=/opt/lucee/tomcat/temp org.apache.catalina.startup.Bootstrap start

Apache is ok I can see that it try to send request to lucee :

192.168.2.9 - - [03/Feb/2017:10:08:07 +0100] “GET
/monitoring/test.cfm?&nocache=575120 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:08:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0;
+http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:08:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0;
+http://www.binarycanary.com/chirp.cfm)”
192.168.2.10 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=316214 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=286391 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.10 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=139938 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=146543 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0;
+http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:09:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0;
+http://www.binarycanary.com/chirp.cfm)”

and
Fri Feb 03 10:14:08.070572 2017] [proxy:error] [pid 9148:tid
140559440111360] [client 192.168.2.9:57315] AH00898: Error reading from
remote server returned by /monitoring/test.cfm
[Fri Feb 03 10:14:35.519512 2017] [proxy_http:error] [pid 9205:tid
140559322613504] (70007)The timeout specified has expired: [client
192.168.2.9:58900] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:35.519595 2017] [proxy:error] [pid 9205:tid
140559322613504] [client 192.168.2.9:58900] AH00898: Error reading from
remote server returned by /api/V1.0
[Fri Feb 03 10:14:43.591082 2017] [proxy_http:error] [pid 9205:tid
140559490467584] (70007)The timeout specified has expired: [client
192.168.2.9:59415] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:43.591202 2017] [proxy:error] [pid 9205:tid
140559490467584] [client 192.168.2.9:59415] AH00898: Error reading from
remote server returned by /api/V1.0

but the error log of lucee hasen’t moved since 5h04 this morning : it’s
10h15 here :
-rw-rw-r-- 1 lucee lucee 97194864 Feb 3 05:04 catalina.out
-rw-rw-r-- 1 lucee lucee 26937 Feb 3 05:04
localhost_access_log.2017-02-03.txt
-rw-rw-r-- 1 lucee lucee 5545 Feb 3 05:04 catalina.2017-02-03.log

03-Feb-2017 03:30:04.328 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.386 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.441 INFO [Thread-25]
org.apache.catalina.core.StandardService.stopInternal Stopping service
Catalina
03-Feb-2017 03:30:04.534 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named [Thread-22]
but has failed to stop it. This is very likely to create a memory leak.
Stack trace of thread:
java.lang.Thread.sleep(Native Method)
lucee.commons.io.SystemUtil.sleep(SystemUtil.java:623)
lucee.runtime.engine.Controler.run(Controler.java:132)
03-Feb-2017 03:30:04.543 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named
[Thread-1231551] but has failed to stop it. This is very likely to create a
memory leak. Stack trace of thread:
java.lang.Object.wait(Native Method)
lucee.commons.io.SystemUtil.wait(SystemUtil.java:646)
lucee.commons.io.retirement.RetireOutputStreamFactory$RetireThread.run(RetireOutputStreamFactory.java:57)
03-Feb-2017 03:30:04.677 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.703 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.737 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.771 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.085 INFO [main]
org.apache.catalina.core.AprLifecycleListener.lifecycleEvent The APR based
Apache Tomcat Native library which allows optimal performance in production
environments was not found on the java.library.path:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
03-Feb-2017 03:30:51.449 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:51.506 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.517 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.554 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.563 INFO [main]
org.apache.catalina.startup.Catalina.load Initialization processed in 3425
ms
03-Feb-2017 03:30:51.634 INFO [main]
org.apache.catalina.core.StandardService.startInternal Starting service
Catalina
03-Feb-2017 03:30:51.635 INFO [main]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet
Engine: Apache Tomcat/8.0.24
03-Feb-2017 03:31:04.670 INFO [api.bpreducer.com-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:08.543 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /opt/lucee/tomcat/webapps/ROOT
03-Feb-2017 03:31:17.501 INFO [127.0.0.1-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:17.659 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /opt/lucee/tomcat/webapps/ROOT has finished in 9,115
ms
03-Feb-2017 03:31:17.666 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:31:17.687 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:31:17.689 INFO [main]
org.apache.catalina.startup.Catalina.start Server startup in 26126 ms
03-Feb-2017 03:31:18.366 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deploying
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
03-Feb-2017 03:31:18.508 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deployment of
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
has finished in 142 ms
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSParser.throwUnexpectedTokenException
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6:
getNextResult: Encountered unexpected unknown token (0x0)
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSReader.throwInvalidTDSToken
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6
got unexpected value in TDS response at offset:0

I got a reboot at 3h30 for this VM !

I can see a pb with a query on a MSSQL server but this should CRASH the
server or at least should kill the process … (that monit would restart)

can you find why it crash and how I could correct it or setup monit to
detect the crash and restart the lucee service …

Thanks for your help !!!

Hopefully someone can give you insight into the real issue of the silent
crash, but as for monit you can use something like this to check Tomcasts
port directly https://gist.github.com/joeyimbasciano/3996208On Friday, February 3, 2017 at 1:19:45 AM UTC-8, Stéphane MERLE wrote:

Hi,

from time to time I got a front crash without explanation …

my setup is 2 loadbalancing VM in nginx with keepalive, and 2 others VM
with lucee (apache2/tomcat conf) : lucee 4.5.4.017

right now one of my node is dead, and don’t reply to any page call.

on the VM, I still got the process :
lucee 989 1 1 03:30 ? 00:05:39 /opt/lucee/jdk/jre/bin/java
-Djava.util.logging.config.file=/opt/lucee/tomcat/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms256m
-Xmx512m -XX:MaxPermSize=128m -javaagent:lib/lucee-inst.jar -javaagent:
/opt/newrelic/newrelic.jar -Djava.endorsed.dirs=/opt/lucee/tomcat/endorsed
-classpath /opt/lucee/tomcat/bin/bootstrap.jar:/opt/lucee/tomcat/bin/
tomcat-juli.jar -Dcatalina.base=/opt/lucee/tomcat -Dcatalina.home=/opt/
lucee/tomcat -Djava.io.tmpdir=/opt/lucee/tomcat/temp org.apache.catalina.
startup.Bootstrap start

Apache is ok I can see that it try to send request to lucee :

192.168.2.9 - - [03/Feb/2017:10:08:07 +0100] “GET
/monitoring/test.cfm?&nocache=575120 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:08:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:08:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.10 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=316214 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=286391 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.10 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=139938 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=146543 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:09:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”

and
Fri Feb 03 10:14:08.070572 2017] [proxy:error] [pid 9148:tid
140559440111360] [client 192.168.2.9:57315] AH00898: Error reading from
remote server returned by /monitoring/test.cfm
[Fri Feb 03 10:14:35.519512 2017] [proxy_http:error] [pid 9205:tid
140559322613504] (70007)The timeout specified has expired: [client
192.168.2.9:58900] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:35.519595 2017] [proxy:error] [pid 9205:tid
140559322613504] [client 192.168.2.9:58900] AH00898: Error reading from
remote server returned by /api/V1.0
[Fri Feb 03 10:14:43.591082 2017] [proxy_http:error] [pid 9205:tid
140559490467584] (70007)The timeout specified has expired: [client
192.168.2.9:59415] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:43.591202 2017] [proxy:error] [pid 9205:tid
140559490467584] [client 192.168.2.9:59415] AH00898: Error reading from
remote server returned by /api/V1.0

but the error log of lucee hasen’t moved since 5h04 this morning : it’s
10h15 here :
-rw-rw-r-- 1 lucee lucee 97194864 Feb 3 05:04 catalina.out
-rw-rw-r-- 1 lucee lucee 26937 Feb 3 05:04
localhost_access_log.2017-02-03.txt
-rw-rw-r-- 1 lucee lucee 5545 Feb 3 05:04 catalina.2017-02-03.log

03-Feb-2017 03:30:04.328 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.386 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.441 INFO [Thread-25]
org.apache.catalina.core.StandardService.stopInternal Stopping service
Catalina
03-Feb-2017 03:30:04.534 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named [Thread-22]
but has failed to stop it. This is very likely to create a memory leak.
Stack trace of thread:
java.lang.Thread.sleep(Native Method)
lucee.commons.io.SystemUtil.sleep(SystemUtil.java:623)
lucee.runtime.engine.Controler.run(Controler.java:132)
03-Feb-2017 03:30:04.543 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named
[Thread-1231551] but has failed to stop it. This is very likely to create a
memory leak. Stack trace of thread:
java.lang.Object.wait(Native Method)
lucee.commons.io.SystemUtil.wait(SystemUtil.java:646)

lucee.commons.io.retirement.RetireOutputStreamFactory$RetireThread.run(RetireOutputStreamFactory.java:57)
03-Feb-2017 03:30:04.677 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.703 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.737 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.771 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.085 INFO [main]
org.apache.catalina.core.AprLifecycleListener.lifecycleEvent The APR based
Apache Tomcat Native library which allows optimal performance in production
environments was not found on the java.library.path:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
03-Feb-2017 03:30:51.449 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:51.506 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.517 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.554 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.563 INFO [main]
org.apache.catalina.startup.Catalina.load Initialization processed in 3425
ms
03-Feb-2017 03:30:51.634 INFO [main]
org.apache.catalina.core.StandardService.startInternal Starting service
Catalina
03-Feb-2017 03:30:51.635 INFO [main]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet
Engine: Apache Tomcat/8.0.24
03-Feb-2017 03:31:04.670 INFO [api.bpreducer.com-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:08.543 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /opt/lucee/tomcat/webapps/ROOT
03-Feb-2017 03:31:17.501 INFO [127.0.0.1-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:17.659 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /opt/lucee/tomcat/webapps/ROOT has finished in 9,115
ms
03-Feb-2017 03:31:17.666 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:31:17.687 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:31:17.689 INFO [main]
org.apache.catalina.startup.Catalina.start Server startup in 26126 ms
03-Feb-2017 03:31:18.366 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deploying
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
03-Feb-2017 03:31:18.508 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deployment of
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
has finished in 142 ms
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSParser.throwUnexpectedTokenException
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6:
getNextResult: Encountered unexpected unknown token (0x0)
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSReader.throwInvalidTDSToken
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6
got unexpected value in TDS response at offset:0

I got a reboot at 3h30 for this VM !

I can see a pb with a query on a MSSQL server but this should CRASH the
server or at least should kill the process … (that monit would restart)

can you find why it crash and how I could correct it or setup monit to
detect the crash and restart the lucee service …

Thanks for your help !!!

Hi,
thanks for you help …

I already got this for monit :
check process tomcat with pidfile /opt/lucee/tomcat/work/tomcat.pid
start “/etc/init.d/lucee_ctl start”
stop “/etc/init.d/lucee_ctl stop”
restart “/etc/init.d/lucee_ctl restart”
if failed port 8888 then alert
if failed port 8888 for 5 cycles then restart
if failed port 8888 for 10 cycles then exec /sbin/reboot

and it’s not triggering …

I can see socket on port 8888 :

tcp 415 0 127.0.0.1:8888 127.0.0.1:46667
CLOSE_WAIT 999 229261 1050/java
tcp 1 0 127.0.0.1:8888 127.0.0.1:45879
CLOSE_WAIT 999 224405 1050/java
tcp 415 0 127.0.0.1:8888 127.0.0.1:46675
CLOSE_WAIT 999 229316 1050/java
tcp 414 0 127.0.0.1:8888 127.0.0.1:46056
CLOSE_WAIT 999 225484 1050/java
tcp 0 0 127.0.0.1:47282 127.0.0.1:8888
FIN_WAIT2 0 0 -
tcp 496 0 127.0.0.1:8888 127.0.0.1:46562
CLOSE_WAIT 999 228624 1050/java
tcp 496 0 127.0.0.1:8888 127.0.0.1:45890
CLOSE_WAIT 999 224466 1050/java
tcp 496 0 127.0.0.1:8888 127.0.0.1:45851
CLOSE_WAIT 999 224225 1050/java
tcp 1 0 127.0.0.1:8888 127.0.0.1:46660
CLOSE_WAIT 999 229225 1050/java
tcp 296 0 127.0.0.1:8888 127.0.0.1:44622
CLOSE_WAIT 999 216563 1050/java
tcp 496 0 127.0.0.1:8888 127.0.0.1:45031
CLOSE_WAIT 999 218802 1050/java
tcp 303 0 127.0.0.1:8888 127.0.0.1:44477
CLOSE_WAIT 999 215959 1050/java
tcp 414 0 127.0.0.1:8888 127.0.0.1:46000
CLOSE_WAIT 999 225158 1050/java
tcp 414 0 127.0.0.1:8888 127.0.0.1:45782
CLOSE_WAIT 999 223801 1050/java
tcp 414 0 127.0.0.1:8888 127.0.0.1:45724
CLOSE_WAIT 999 223448 1050/java
tcp 414 0 127.0.0.1:8888 127.0.0.1:47330
ESTABLISHED 999 236796 1050/java
tcp 496 0 127.0.0.1:8888 127.0.0.1:45195
CLOSE_WAIT 999 219808 1050/java
tcp 415 0 127.0.0.1:8888 127.0.0.1:46114
CLOSE_WAIT 999 225855 1050/java

I have no clue of what is stucking …Le vendredi 3 février 2017 18:24:49 UTC+1, Don Quist a écrit :

Hopefully someone can give you insight into the real issue of the silent
crash, but as for monit you can use something like this to check Tomcasts
port directly https://gist.github.com/joeyimbasciano/3996208

On Friday, February 3, 2017 at 1:19:45 AM UTC-8, Stéphane MERLE wrote:

Hi,

from time to time I got a front crash without explanation …

my setup is 2 loadbalancing VM in nginx with keepalive, and 2 others VM
with lucee (apache2/tomcat conf) : lucee 4.5.4.017

right now one of my node is dead, and don’t reply to any page call.

on the VM, I still got the process :
lucee 989 1 1 03:30 ? 00:05:39 /opt/lucee/jdk/jre/bin/java
-Djava.util.logging.config.file=/opt/lucee/tomcat/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -
Xms256m -Xmx512m -XX:MaxPermSize=128m -javaagent:lib/lucee-inst.jar -
javaagent:/opt/newrelic/newrelic.jar -Djava.endorsed.dirs=/opt/lucee/
tomcat/endorsed -classpath /opt/lucee/tomcat/bin/bootstrap.jar:/opt/lucee
/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/lucee/tomcat -Dcatalina.
home=/opt/lucee/tomcat -Djava.io.tmpdir=/opt/lucee/tomcat/temp org.apache
.catalina.startup.Bootstrap start

Apache is ok I can see that it try to send request to lucee :

192.168.2.9 - - [03/Feb/2017:10:08:07 +0100] “GET
/monitoring/test.cfm?&nocache=575120 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:08:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:08:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.10 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=316214 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:02 +0100] “GET
/monitoring/test.cfm?&nocache=286391 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.10 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=139938 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:07 +0100] “GET
/monitoring/test.cfm?&nocache=146543 HTTP/1.1” 502 676 “-”
“libwww-perl/5.834”
192.168.2.9 - - [03/Feb/2017:10:09:35 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”
192.168.2.9 - - [03/Feb/2017:10:09:43 +0100] “GET /api/V1.0/ping/binary
HTTP/1.0” 502 665 “-” “Mozilla/5.0 (compatible; Chirp/1.0; +
http://www.binarycanary.com/chirp.cfm)”

and
Fri Feb 03 10:14:08.070572 2017] [proxy:error] [pid 9148:tid
140559440111360] [client 192.168.2.9:57315] AH00898: Error reading from
remote server returned by /monitoring/test.cfm
[Fri Feb 03 10:14:35.519512 2017] [proxy_http:error] [pid 9205:tid
140559322613504] (70007)The timeout specified has expired: [client
192.168.2.9:58900] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:35.519595 2017] [proxy:error] [pid 9205:tid
140559322613504] [client 192.168.2.9:58900] AH00898: Error reading from
remote server returned by /api/V1.0
[Fri Feb 03 10:14:43.591082 2017] [proxy_http:error] [pid 9205:tid
140559490467584] (70007)The timeout specified has expired: [client
192.168.2.9:59415] AH01102: error reading status line from remote server
127.0.0.1:8888
[Fri Feb 03 10:14:43.591202 2017] [proxy:error] [pid 9205:tid
140559490467584] [client 192.168.2.9:59415] AH00898: Error reading from
remote server returned by /api/V1.0

but the error log of lucee hasen’t moved since 5h04 this morning : it’s
10h15 here :
-rw-rw-r-- 1 lucee lucee 97194864 Feb 3 05:04 catalina.out
-rw-rw-r-- 1 lucee lucee 26937 Feb 3 05:04
localhost_access_log.2017-02-03.txt
-rw-rw-r-- 1 lucee lucee 5545 Feb 3 05:04 catalina.2017-02-03.log

03-Feb-2017 03:30:04.328 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.386 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.441 INFO [Thread-25]
org.apache.catalina.core.StandardService.stopInternal Stopping service
Catalina
03-Feb-2017 03:30:04.534 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named [Thread-22]
but has failed to stop it. This is very likely to create a memory leak.
Stack trace of thread:
java.lang.Thread.sleep(Native Method)
lucee.commons.io.SystemUtil.sleep(SystemUtil.java:623)
lucee.runtime.engine.Controler.run(Controler.java:132)
03-Feb-2017 03:30:04.543 WARNING [api.bpreducer.com-startStop-2]
org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The
web application [ROOT] appears to have started a thread named
[Thread-1231551] but has failed to stop it. This is very likely to create a
memory leak. Stack trace of thread:
java.lang.Object.wait(Native Method)
lucee.commons.io.SystemUtil.wait(SystemUtil.java:646)

lucee.commons.io.retirement.RetireOutputStreamFactory$RetireThread.run(RetireOutputStreamFactory.java:57)
03-Feb-2017 03:30:04.677 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.703 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:04.737 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:04.771 INFO [Thread-25]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.085 INFO [main]
org.apache.catalina.core.AprLifecycleListener.lifecycleEvent The APR based
Apache Tomcat Native library which allows optimal performance in production
environments was not found on the java.library.path:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
03-Feb-2017 03:30:51.449 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:30:51.506 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.517 INFO [main]
org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:30:51.554 INFO [main]
org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared
selector for servlet write/read
03-Feb-2017 03:30:51.563 INFO [main]
org.apache.catalina.startup.Catalina.load Initialization processed in 3425
ms
03-Feb-2017 03:30:51.634 INFO [main]
org.apache.catalina.core.StandardService.startInternal Starting service
Catalina
03-Feb-2017 03:30:51.635 INFO [main]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet
Engine: Apache Tomcat/8.0.24
03-Feb-2017 03:31:04.670 INFO [api.bpreducer.com-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:08.543 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /opt/lucee/tomcat/webapps/ROOT
03-Feb-2017 03:31:17.501 INFO [127.0.0.1-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
03-Feb-2017 03:31:17.659 INFO [127.0.0.1-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /opt/lucee/tomcat/webapps/ROOT has finished in 9,115
ms
03-Feb-2017 03:31:17.666 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“http-nio-8888”]
03-Feb-2017 03:31:17.687 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[“ajp-nio-8009”]
03-Feb-2017 03:31:17.689 INFO [main]
org.apache.catalina.startup.Catalina.start Server startup in 26126 ms
03-Feb-2017 03:31:18.366 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deploying
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
03-Feb-2017 03:31:18.508 INFO
[api-bpreducer-com-api-smtp-confl2-startStop-1]
org.apache.catalina.startup.HostConfig.deployDescriptor Deployment of
configuration descriptor
/opt/lucee/tomcat/conf/Catalina/api-bpreducer-com-api-smtp-confl2/ROOT.xml
has finished in 142 ms
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSParser.throwUnexpectedTokenException
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6:
getNextResult: Encountered unexpected unknown token (0x0)
03-Feb-2017 05:04:01.527 SEVERE [http-nio-8888-exec-30]
com.microsoft.sqlserver.jdbc.TDSReader.throwInvalidTDSToken
ConnectionID:229 ClientConnectionId: ea3556d2-24d6-473b-9f80-9adbce53dcc6
got unexpected value in TDS response at offset:0

I got a reboot at 3h30 for this VM !

I can see a pb with a query on a MSSQL server but this should CRASH the
server or at least should kill the process … (that monit would restart)

can you find why it crash and how I could correct it or setup monit to
detect the crash and restart the lucee service …

Thanks for your help !!!