5.3.9 stops logging while in a docker container

OS: ubuntu 18.04
Docker: 19.03.6
image:5.3.9.133-tomcat9.0-jdk11-openjdk

I’ve tried a wide variety of configurations. No matter what I do, the logging stops after some period of time (usu. something like 25 minutes). Lucee continues to run in the container (I can see it doing the work I want it to do). But no logs are generated.

I have not spun up 5.3.9 outside of docker so I don’t know if this is in 5.3.9 or the dockerization of it.

Anyone else experiencing this unknown stoppage?

B.

Somewhat related to

@Brian_Harcourt Are you hitting this 5.3.9 regression?

https://luceeserver.atlassian.net/browse/LDEV-3978

Note, that ticket is fixed in the 5.3.9.137 snapshot build.

1 Like

Hmm - @bdw429s - thanks for the idea but it doesn’t look like that’s it. (Unless I built the 5.3.9.137 container incorrectly).

I’ve tried A-B testing so many configurations I’m sure I’m just missing something. At this point I’m just trying to follow any documentation I can find. Namely;


<system err="default" out="default" />

In lucee-server.xml.
And the log of concern in lucee-web.xml.cfm using the config injected by the Admin page; specifically


  <logger appender-arguments="streamtype:output" appender-class="lucee.commons.io.log.log4j2.appender.ConsoleAppender" 
    layout-arguments="pattern:&quot;%p&quot;,&quot;%d{yyyy-MM-dd HH:mm:ssXXX}&quot;,&quot;%t&quot;,&quot;%c{1}&quot;,&quot;%m&quot;%n" 
    layout-class="org.apache.logging.log4j.core.layout.PatternLayout" level="info" name="taskqueue"/>

I cribbed the pattern from a previous working config; maybe Log4j2 doesn’t like patterns rendered this way?

In short, any ideas you could point me to would be appreciated.

Thanks,
B.

@Terry_Whitney - interesting note! I discounted this as not the issue as tomcat9 isn’t a systemd service in my container; just ‘catalina.sh run’ in the foreground. But this looks like a place to try and understand STDOUT in containers - which seems like what I’m not understanding.

Even if its not the cause, running a secure, performance enhanced buggy free version of tomcat will only help you further troubleshoot the issue.

Just spit-balling in public . . . here’s my current theory:
This is a task engine. It takes a request from a scheduled task once per minute that is a script that says “run tasks”.

The request sets down an application scoped marker, then finds a task and processes to completion. Then finds another task and processes to completion. Etc. etc. til it runs out of tasks or sees another request has started. (If, on a completion, it finds the application marker has changed; then a new request has begun and this request ends.) The timeout is 10 minutes. I don’t believe any requests timeout, they simply see the hand-off and stop.
Thus, a second thread may be logging at the same time a previous thread is logging.
This happens all the time so I can’t imagine this would be much of an issue.

But, the task engines with higher loads (more likely for requests to overlap) are the engines that stop logging. So, I’m wondering about new Log4j2 infrastructure and whether it might be frustrated by shutting down a thread when there is no response.

The last log record that comes out is always
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8888"]

If this spurs thoughts in anyone as to how to assess further; they’d be welcome.

Digging here occasionally; just leaving notes.

I see the one thing the failing container does that the non-failing containers don’t do is instantiate a java.net.URL.
I spun up a container in developing another task handler; that too instantiates a java.net.URL and it too stops logging after a few minutes.

I may try and spin up fuller test cases if I find time.

Just notes on Virtualization.

VMware ESxi & Vsphere- Thousands of instances no issues until there is a massive update from Vmware. ESXi has a free version, I highly recommend this if you want to do what the “Enterprise Gold Standard” kids are doing.

Xen(Citrix or Opensource) - No issues until a server or cluster loses power, then minor issues with powering up some instances running older OS’s. Both versions of free and paid run as fast as the hardware you can slap them on. Not as efficient at over provisioning as Vmware nor handles large memory configurations as well as Vmware or Hyperv. Large memory being over 2TB or more.

Hyper-V - Great for Microsoft Windows Instances, limited support for most NIX systems. Works well with all things “Microsoft”, is free to OMG I PAID WHAT? - Runs better on newer hardware, slow IOV compared to VMware. Networking wise not as many “good” options as Vmware, still a great Free server OS and Free virtualization platform for developers as Hyper-V ships with Pro & Enterprise versions of Windows OS.

VirtualBox - Runs on nearly every platform, much akin to what Vmware GSX used to be (If your not sure, its ok) Simplified Virtualization that is cross platform and enterprise ready. This product does not get the attention it deserves. In benchmarks it greatly depends upon which OS you are installing it upon. Top speed ranks on OpenSuse & Windows 2019 pre November patch

Quem & KVM - If you are a NIX ninja and like command line operations, spending nonsensical hours configuring every little damn chunk of memory, or need to intercept memory calls real time, both are good provided you LOVE C and the command line shell of NIX.

Docker - Once, not long ago it was good for a free product. You could write chef and puppet scripts to spin up and down thousands of instances real time and do it all for inexpensive to free, then Docker became greedy. The problem is the short comings with port issues, network performance issues, and portability issues. Sure, its still “free” for individuals but for the enterprise, its now a cost center and there are better options that just work better, including Hyper-V and Xen.

Many things break on docker as its not a dedicated hypervisor, its a drop in add on top product that isnt keeping up with the changes at the OS level. Other products handle portable virtualization better.

From a Developer Standpoint, run Virtualbox or Hyper-V, you can run both on the same machine. You will find you can export your image to nearly anything.

From a Sys Admin standpoint, Vmware is hard to beat if money is no object. If budget is a concern, then it becomes which is more important, clustering or performance, you can do your own ROI and figure it out.

@Terry_Whitney Wow! Thanks for the overview of virtualization! I didn’t realize there were so many options. Unfortunately, I think we’re pretty committed to containerd at this point.

It struck me today that I’d been using an undocumented feature or writelog(). The docs say that if you use the attribute log, the accepted values are application or scheduler. I’ve been using a value of taskqueue to fill my own logger defined in lucee-web.xml.cfm.

It’s been working fine. But maybe there’s something new with 5.9 that means this doesn’t fly anymore? Tho it’s strange that it partially works.

I have not had time to setup better testing to find definitive issues but, for anyone interested, this is how I’ve resolved my logging needs:

TOMCAT 9.0.63
JDK11 (Corretto)
LUCEE 5.3.9.141

The issue seemed to be that logging would stop whenever a native java object was created (e.g. createObject('java','java.net.URL').

I didn’t want to give up my custom log, and I assumed that pointing to my own log (‘taskqueue’) really should work despite it not being documented. i.e. I kept writelog(text="#message#", log="taskqueue"). Tho - by this point I’d moved the taskqueue log deifinition into lucee-server.xml:

<logger appender-arguments="streamtype:output" appender-class="lucee.commons.io.log.log4j2.appender.ConsoleAppender" 
    layout-arguments="pattern:&quot;%p&quot;,&quot;%d{yyyy-MM-dd HH:mm:ssXXX}&quot;,&quot;%t&quot;,&quot;%c{1}&quot;,&quot;%m&quot;%n" 
    layout-class="org.apache.logging.log4j.core.layout.PatternLayout" level="info" name="taskqueue"/>

I believe it was after configuring TOMCAT to log with log4j, I found that all logs, regardless of whether the container was instantiating a native java object, was spitting out an error:

ERROR Attempted to append to non-started appender web.0f82d99d7804e40dd3490107136cb8cc.application

I figured that 0f82d99d7804e40dd3490107136cb8cc was the Lucee context hash so I simply added that as a TOMCAT log4j configuration:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
  <Appenders>
    <Console name="luceeConsole">
      <PatternLayout pattern="&quot;%p&quot;,&quot;%d{yyyy-MM-dd HH:mm:ssXXX}&quot;,&quot;%t&quot;,&quot;%c{1}&quot;,&quot;%m&quot;%n" />
    </Console>
  </Appenders>
  <Loggers>
    <Root level="ERROR">
      <AppenderRef ref="luceeConsole" level="ERROR"/>
    </Root>
    <Logger name="web.0f82d99d7804e40dd3490107136cb8cc" level="ERROR" additivity="false">
      <AppenderRef ref="luceeConsole" level="ERROR"/>
    </Logger>
    <Logger name="web.0f82d99d7804e40dd3490107136cb8cc.taskqueue" level="INFO" additivity="false">
      <AppenderRef ref="luceeConsole" level="INFO"/>
    </Logger>
</Loggers>
</Configuration>

Log entries do not halt for those containers instantiating native java objects but all log entries are duplicated. To eliminate the duplication I adjusted the layout pattern in one of them so logstash could distinguish between the two, then logstash simply drops one of the entries.

1 Like

I’ve filed a bug referencing this thread, feel free to update it

https://luceeserver.atlassian.net/browse/LDEV-4022

I’ve checked this issue with both lucee 5.3.9.141 and LDEV-5.3.9.143-SNAPSHOT.

In my checking, I also faced the same issue of the custom logs stopping logging the entries in tomcat (not docker container) after some time. But for me, that logging was stopped after any other log with level error was logging entries but not by initiating the java object (createObject("java","java.net.URL)) as you mentioned.

The issue was solved in the lucee latest version 5.3.9.143-SNAPSHOT by the ticket LDEV-4035.
@Brian_Harcourt Can you please check with that version and report here back? It will improve the status of the ticket LDEV-4022 - 5.3.9 stops logging while in a docker container.

1 Like

I’ve spun up one of my taskengine containers with 5.3.9.143-SNAPSHOT; leaving all tomcat logging at the default. I’m seeing Lucee scheduler logs continuing to run. But my custom log taskqueue halt. Taskqueue logged successfully for about 5 minutes then halted yet the schedule log continues to be produced. I think this is an improvement.

I’m not seeing ERROR level entries in any logs but it’s possible that my logstash config may be somehow missing them. In other words, my guess about java object instantiation as an issue was just that; guess. And I don’t feel I can say that your finding about ERROR logging is not the cause. But in either case - it seems my issue still exists in 5.3.9.143.

UPDATE
It may be that I was misreading my recent test. I’ve noted some other config issues that may have altered results. When I have a chance I’ll return for a better test of 5.3.9.143.

1 Like

I have spun up an additional test with 5.3.9.143-SNAPSHOT (again leaving tomcat logging at default).

I’m happy to report that logs are continuing to report! My view is that my issues was indeed resolved in 5.3.9.143!

Thank you!!

B.

1 Like

Thanks, @Brian_Harcourt, I mark that ticket as resolved

We are getting this on 5.4.1.8: weren’t getting it on 5.3.9.166.

So this has possibly regressed.

@AdamCameron are you using the default logging config or custom?

either way, lets start a new thread for this and just reference this one?

1 Like