G’day
@Zackster has asked me to copy this from a thread on Slack to here, so that’s what I’m doing. It’s gonna be mostly copy and paste, so apologies if it doesn’t flow so well.The situation also evolves as it continues, so the symptoms I initially notice might or might not be relevant to the eventual actual problem. It’s all just part of me trying to work out what’s going on.
Copy and paste from Slack begins…
Adam Cameron:
What might cause the dump of the same Java object on one Lucee 5.3.9.133 instance to include a section bundle-info
at the bottom, and a dump of the same object on a different Lucee 5.3.9.133 instance to not include that section?
Both built from the same lucee/lucee:5.3.9.133-nginx
Docker image. The Docker config for each is superficially different, but not in any way that ought to impact this.
IE: one dump ends with:
The other ends with:
Zac Spitzer:
What’s the object?
AC:
It’s a third party thing that will be meaningless to you. Why might that matter? (I’m not suggesting it won’t matter… just trying to understand why you think it might).
The code in both cases is:
<cfdump var="#CreateObject("java", "com.vendor.app.ClassName")#"><cfabort>
It’s in the same jar file in the same location in both cases (as per bundle-info.location
in the first dump).
Dan Switzer II:
Is it possible the object was loaded via different methods or that one dump is showing the object after having been created in the request and the other showing the object that’s being cached in a scope (or via some other caching mechanism)?
AC:
Good question, but I mean it’s literally that ^^^ exact code (other than the path being sanitised) running on both servers. I copied it from one to the other.
Now… one is being called in a single-line .cfm file (containing only that code. No Application.cfc or anything), and the other is via a bunch of other code (including a CFWheels app in the way), but I’ve made sure that this code is the first and only time CreateObject("java", "com.vendor.app.ClassName")
is being called (and it’s the only reference to com.vendor.app.ClassName
in the codebase), as I did not want any other code “interfering” with it before I call it. I am… 95% nothing else is in play.
Both dumps are being done immediately after a Lucee restart, and it’s the first request being run on each instance after restart.Any number of other reqs to same code subsequently has no impact on what is dumped.
BTW: it’s the version of the code running inside the app that has the bundle-info
bit. The one in the straight test.cfm
file does not.
[time passes]
OK. A different question. WTF is Lucee doing to my jar file??
I copied the .jar file to /opt/lucee/server/lucee-server/context/lib/ezsign-4.1.3.jar
I restarted the container
I diffed the two files: same
I made the request that has the createObject
call in it
I diff the two files: now the one at /opt/lucee/server/lucee-server/context/lib/ezsign-4.1.3.jar
is different (as indicated above).
DS2:
It feels like there’s a class loader involved in the one showing the bundle-info.
As for the other question, I wonder if the change is to make the JAR OSGi compliant.
AC:
That said, it’s doing the same thing in both cases, but WTF is it up to?
to make the JAR OSGi compliant.
Oh God. Probably. [despair]
We did have some other jars being loaded via referencing them in this.javaSettings
in Application.cfc (or CFWheels’s approximation thereof), but I just removed that and no change.
Good thinking though.
DS2:
Did you restart Lucee? There could be some caching going on. It always seems like I’ve got to restart Lucee when changing JAR settings if I want to be 100% sure my changes take affect. (This is especially true with ACF)
AC:
[Yes]
I’ve factored-out the rest of the application now. I’m using the same one file (exact same file… using a volume from my host machine) in both containers.
Yeah I am restarting the container between every test.
[irrelevant details snipped]
DS2:
It feels like the bundle-info is seen when a JAR is OSGi compliant and it’s not there when using a non-OSGi compliant JAR.
AC:
Yeah I did wonder about that sorta thing (didn’t think Lucee would actually be changing the jars though!!) but it’s doing it in both envs
DS2:
I wonder if maybe in the environment we’re you’re not seeing it, there’s a directory permission preventing the OSGi change you’re seeing in the other environment (but maybe you said it was the reverse behavior)
AC:
I just checked both “changed” files (and they were both changed). binary different, but when I unzipped them, the contents were identical. So am guessing the “difference” is some meta info the re-jar-ing process does or something.
OK… progress…
If I copy the “updated” jar file from the “working” container to the “broken” container, restart, retest… I now get the bundle-info
in the “broken” container too.
So this sez to me that whatever Lucee is doing to the jar… it’s doing it differently in each of these containers.
I blew away the broken container completely and replicated…
- at first broken.
- replace the “updated” jar file with the initial “unfixed” one… broken.
- replace the “broken” container’s “fixed” jar with the “fixed” one from the original container… now the broken container “works” as well.
[time passes]
I need to return to this thread about how Lucee messes with third-party jars, and the ramifications thereof.
To recap I noticed that how the same Java object dumps out varies for me across two “identical” Lucee containers, and in looking into this I noticed that Lucee is actually changing the third-party jar files. This is… subpar, but it is what it is.
In further troubleshooting this, I am noticing some anomalous behaviour that I am struggling to understand and I am 95% certain it’s down to Lucee munging with things that it shouldn’t.
So I stick this jar into /opt/lucee/server/lucee-server/context/lib/
via the Dockerfile. I build the container, wait for Lucee to gets its act together and be responsive, and I hit my test code. First time I hit it I get this:
No Bundle provided
lucee.runtime.exp.NativeException: No Bundle provided at
java.base/java.util.Objects.requireNonNull(Objects.java:246) at
org.apache.logging.log4j.core.osgi.BundleContextSelector.locateContext(BundleContextSelector.java:137) at
org.apache.logging.log4j.core.osgi.BundleContextSelector.getContext(BundleContextSelector.java:124) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:230) at
@Brad_Wood if yer reading, this might be familiar to you ([COMMANDBOX-1460] - Welcome). NB: CommandBox is not in play here, I’m just pinging you cos you’ve experienced this too.
Note: I am not directly calling anything to do with Log4J, but the third party code has a wrapper around same, so it will be.
If I wait a few sec and run the test code again, this time I get:
Could not initialize class com.Moo.Zoo.log.GooLog
lucee.runtime.exp.NativeException: Could not initialize class com.Moo.Zoo.log.GooLog at
com.Moo.Zoo.Foo.<init>(Foo.java:103) at com.Moo.Zoo.Foo.<init>(Foo.java:119) at
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at
(I’ve changed a few things in there to redact the vendor & their app)
OK so that’s already weird.
This stays the same for as many re-runs as I like.
UNLESS
I restart Tomcat (ie via shutdown.sh and startup.sh).
Then the code just works.
NB: restarting the container doesn’t work. It needs to just be a restart of tomcat in the running container.
I can reliably repeat this as many times as I want.
So what I’d like a clear straight answer on, please, is… WTF?
If we can’t explain the entire [wt]F, then perhaps some lucidity on what Lucee is doing to these poor jar files, and also why I’m getting three different results depending on whether it’s the first hit to the jar, subsequent hits to the jar, or subsequent hits to the jar after a Tomcat restart.
NB: if I add ENV LUCEE_JAVA_OPTS="-DLog4jContextSelector=org.apache.logging.log4j.core.selector.ClassLoaderContextSelector"
into my Dockerfile (thanks for the tip there Brad), then the thing works right from the outset.
ZS:
can you post this question to the mailing list? it’s a good important topic and it would be good to have this visible (and hopefully the solution) to world for ever, unlike this black hole
Brad Wood:
[…] I can’t explain most of that, but I do know at least one thing-- and I don’t know if it’s a Lucee behavior or a Java classloader behavior. But I’ve noticed over the years playing with Java libs that if the INITIALIZTION of a class fails, which can include the static block or the constructor, Lucee/Java/whatever will NOT try and load the class again until you restart the server. That means the “real” error will be a one-time gig and if you miss it, you’ll be chasing smoke with the secondary error that happens on all subsequent attempts. (
[…]
On that Log4j osgi bit I never did get to the bottom of it. It was related to Log4j being used in an OSGi environment and attempting to find it’s classes via a bundle (instead of straight CL hierarchy). I chased the error all the way to the precipice of the dark chasm of Lucee’s OSGI classloading and gave up by just disabling the Log4j feature entirely, lol
AC:
Yeah if it was on me, I’d get rid of the logging too. We don’t even use the part of the app that does any logging. But… outwith our control.
I think just adding the -D
setting will solve it for us, but I this whole thing smells a bit.
[time passes]
Back on topic (soz), just tested this on CF and it just works, as one would expect and perhaps hope for of Lucee.
Oh and PS: this is all new to 5.3.9.133
. It works fine on 5.3.7.47
.
[fin]
OK so that’s a wall of text, sorry.
I’d really like an understanding of what Lucee is up to here, and also in what way it’s meddling has changed between 5.3.7.47
and 5.3.9.133
to mess things up.
Cheers.
–
Adam