Lucee doing something weird with third-party jar files (change between 5.3.7.47 and 5.3.9.133)

G’day
@Zackster has asked me to copy this from a thread on Slack to here, so that’s what I’m doing. It’s gonna be mostly copy and paste, so apologies if it doesn’t flow so well.The situation also evolves as it continues, so the symptoms I initially notice might or might not be relevant to the eventual actual problem. It’s all just part of me trying to work out what’s going on.

Copy and paste from Slack begins…

Adam Cameron:
What might cause the dump of the same Java object on one Lucee 5.3.9.133 instance to include a section bundle-info at the bottom, and a dump of the same object on a different Lucee 5.3.9.133 instance to not include that section?

Both built from the same lucee/lucee:5.3.9.133-nginx Docker image. The Docker config for each is superficially different, but not in any way that ought to impact this.

IE: one dump ends with:

image

The other ends with:

image

Zac Spitzer:
What’s the object?

AC:
It’s a third party thing that will be meaningless to you. Why might that matter? (I’m not suggesting it won’t matter… just trying to understand why you think it might).

The code in both cases is:

<cfdump var="#CreateObject("java", "com.vendor.app.ClassName")#"><cfabort>

It’s in the same jar file in the same location in both cases (as per bundle-info.location in the first dump).

Dan Switzer II:
Is it possible the object was loaded via different methods or that one dump is showing the object after having been created in the request and the other showing the object that’s being cached in a scope (or via some other caching mechanism)?

AC:
Good question, but I mean it’s literally that ^^^ exact code (other than the path being sanitised) running on both servers. I copied it from one to the other.

Now… one is being called in a single-line .cfm file (containing only that code. No Application.cfc or anything), and the other is via a bunch of other code (including a CFWheels app in the way), but I’ve made sure that this code is the first and only time CreateObject("java", "com.vendor.app.ClassName") is being called (and it’s the only reference to com.vendor.app.ClassName in the codebase), as I did not want any other code “interfering” with it before I call it. I am… 95% nothing else is in play.

Both dumps are being done immediately after a Lucee restart, and it’s the first request being run on each instance after restart.Any number of other reqs to same code subsequently has no impact on what is dumped.

BTW: it’s the version of the code running inside the app that has the bundle-info bit. The one in the straight test.cfm file does not.

[time passes]

OK. A different question. WTF is Lucee doing to my jar file??

image

I copied the .jar file to /opt/lucee/server/lucee-server/context/lib/ezsign-4.1.3.jar
I restarted the container
I diffed the two files: same
I made the request that has the createObject call in it
I diff the two files: now the one at /opt/lucee/server/lucee-server/context/lib/ezsign-4.1.3.jar is different (as indicated above).

DS2:
It feels like there’s a class loader involved in the one showing the bundle-info.

As for the other question, I wonder if the change is to make the JAR OSGi compliant.

AC:
That said, it’s doing the same thing in both cases, but WTF is it up to?

to make the JAR OSGi compliant.

Oh God. Probably. [despair]

We did have some other jars being loaded via referencing them in this.javaSettings in Application.cfc (or CFWheels’s approximation thereof), but I just removed that and no change.

Good thinking though.

DS2:
Did you restart Lucee? There could be some caching going on. It always seems like I’ve got to restart Lucee when changing JAR settings if I want to be 100% sure my changes take affect. (This is especially true with ACF)

AC:
[Yes]

I’ve factored-out the rest of the application now. I’m using the same one file (exact same file… using a volume from my host machine) in both containers.

Yeah I am restarting the container between every test.

[irrelevant details snipped]

DS2:
It feels like the bundle-info is seen when a JAR is OSGi compliant and it’s not there when using a non-OSGi compliant JAR.

AC:
Yeah I did wonder about that sorta thing (didn’t think Lucee would actually be changing the jars though!!) but it’s doing it in both envs

DS2:
I wonder if maybe in the environment we’re you’re not seeing it, there’s a directory permission preventing the OSGi change you’re seeing in the other environment (but maybe you said it was the reverse behavior)

AC:
I just checked both “changed” files (and they were both changed). binary different, but when I unzipped them, the contents were identical. So am guessing the “difference” is some meta info the re-jar-ing process does or something.

OK… progress…

If I copy the “updated” jar file from the “working” container to the “broken” container, restart, retest… I now get the bundle-info in the “broken” container too.

So this sez to me that whatever Lucee is doing to the jar… it’s doing it differently in each of these containers.

I blew away the broken container completely and replicated…

  • at first broken.
  • replace the “updated” jar file with the initial “unfixed” one… broken.
  • replace the “broken” container’s “fixed” jar with the “fixed” one from the original container… now the broken container “works” as well.

[time passes]

I need to return to this thread about how Lucee messes with third-party jars, and the ramifications thereof.

To recap I noticed that how the same Java object dumps out varies for me across two “identical” Lucee containers, and in looking into this I noticed that Lucee is actually changing the third-party jar files. This is… subpar, but it is what it is.

In further troubleshooting this, I am noticing some anomalous behaviour that I am struggling to understand and I am 95% certain it’s down to Lucee munging with things that it shouldn’t.

So I stick this jar into /opt/lucee/server/lucee-server/context/lib/ via the Dockerfile. I build the container, wait for Lucee to gets its act together and be responsive, and I hit my test code. First time I hit it I get this:

No Bundle provided

lucee.runtime.exp.NativeException: No Bundle provided at
java.base/java.util.Objects.requireNonNull(Objects.java:246) at 
org.apache.logging.log4j.core.osgi.BundleContextSelector.locateContext(BundleContextSelector.java:137) at 
org.apache.logging.log4j.core.osgi.BundleContextSelector.getContext(BundleContextSelector.java:124) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:230) at 

@Brad_Wood if yer reading, this might be familiar to you ([COMMANDBOX-1460] - Welcome). NB: CommandBox is not in play here, I’m just pinging you cos you’ve experienced this too.

Note: I am not directly calling anything to do with Log4J, but the third party code has a wrapper around same, so it will be.

If I wait a few sec and run the test code again, this time I get:

Could not initialize class com.Moo.Zoo.log.GooLog

lucee.runtime.exp.NativeException: Could not initialize class com.Moo.Zoo.log.GooLog at 
com.Moo.Zoo.Foo.<init>(Foo.java:103) at com.Moo.Zoo.Foo.<init>(Foo.java:119) at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at 

(I’ve changed a few things in there to redact the vendor & their app)

OK so that’s already weird.

This stays the same for as many re-runs as I like.

UNLESS

I restart Tomcat (ie via shutdown.sh and startup.sh).

Then the code just works.

NB: restarting the container doesn’t work. It needs to just be a restart of tomcat in the running container.

I can reliably repeat this as many times as I want.

So what I’d like a clear straight answer on, please, is… WTF?

If we can’t explain the entire [wt]F, then perhaps some lucidity on what Lucee is doing to these poor jar files, and also why I’m getting three different results depending on whether it’s the first hit to the jar, subsequent hits to the jar, or subsequent hits to the jar after a Tomcat restart.

NB: if I add ENV LUCEE_JAVA_OPTS="-DLog4jContextSelector=org.apache.logging.log4j.core.selector.ClassLoaderContextSelector" into my Dockerfile (thanks for the tip there Brad), then the thing works right from the outset.

ZS:
can you post this question to the mailing list? it’s a good important topic and it would be good to have this visible (and hopefully the solution) to world for ever, unlike this black hole

Brad Wood:
[…] I can’t explain most of that, but I do know at least one thing-- and I don’t know if it’s a Lucee behavior or a Java classloader behavior. But I’ve noticed over the years playing with Java libs that if the INITIALIZTION of a class fails, which can include the static block or the constructor, Lucee/Java/whatever will NOT try and load the class again until you restart the server. That means the “real” error will be a one-time gig and if you miss it, you’ll be chasing smoke with the secondary error that happens on all subsequent attempts. (

[…]

On that Log4j osgi bit I never did get to the bottom of it. It was related to Log4j being used in an OSGi environment and attempting to find it’s classes via a bundle (instead of straight CL hierarchy). I chased the error all the way to the precipice of the dark chasm of Lucee’s OSGI classloading and gave up by just disabling the Log4j feature entirely, lol

AC:
Yeah if it was on me, I’d get rid of the logging too. We don’t even use the part of the app that does any logging. But… outwith our control.

I think just adding the -D setting will solve it for us, but I this whole thing smells a bit.

[time passes]

Back on topic (soz), just tested this on CF and it just works, as one would expect and perhaps hope for of Lucee.

Oh and PS: this is all new to 5.3.9.133 . It works fine on 5.3.7.47 .

[fin]

OK so that’s a wall of text, sorry.

I’d really like an understanding of what Lucee is up to here, and also in what way it’s meddling has changed between 5.3.7.47 and 5.3.9.133 to mess things up.

Cheers.


Adam

1 Like

I have zero answers to your What’s That For!?

I am curious though if you get the same behavior on the latest 5.3.9.150?

Very good question. Confirmed: same behaviour as 5.3.9.133.

Cheers for reminding me to check that.


Adam

A reproducible test case would be super useful…

Not possible I’m afraid. It’s a paid-for product.

Also perhaps Brad’s CommandBox description is sufficient to repro? See earlier link above to [COMMANDBOX-1460] - Welcome

It was 100% reproducible in CommandBox without the fix on that ticket. But it was never clear you me if it was a bug in Log4j or Felix or Lucee’s usage of the two.

As it’s a behavioural change between two Lucee versions, it works fine on CF, and… Log4J has somewhat more credibility & ubiquity as a Java project than Lucee has (and yes I am mindful of various recent… “situation”… with Log4J when I say this), I’d be pretty much focusing on changes made in Lucee between 5.3.7.47 and 5.3.9.133.

I get a similar problem after upgrade to 5.3.9

https://lucee.daemonite.io/t/lucee-5-3-9-133-problem-using-a-jar/10129/2

Sorry for resurrecting this after a year, but I think we stumbled across the same issue when trying to instantiate a certain Apache POI class (XWPFDocument) using createObject().
Tested on 5.3.9.141 and 5.4.1.8 (latest release).

I uploaded a repro case including all dependencies here: Filebin | 6upr9tbg6fp10d6c
It contains a Lucee test case and a Java-only test case for cross-checking.
To reproduce in Lucee:

  • Unzip the archive into a directory
  • Run test.cfm - Lucee should throw a NullPointerException with message “No bundle provided”, with the stack trace pointing to Log4j.

As already described in this thread, the bundle error message will only occur upon the first invocation. To work around Lucee’s cache logic, simply rename the file poi-ooxml-5.2.3.jar to something else and try again. (Each file name can only be used once, so renaming back and forth doesn’t do the trick. Use something like incrementing numbers instead.)

Setting the Log4jContextSelector variable - like suggested in this thread - gets rid of the NullPointerException.
(As a side note: On 5.3.9.141, setting Log4jContextSelector prevents the NullPointerException, but a NoSuchMethodError occurs instead, in a different part of the code. The NoSuchMethodError exception does not happen on 5.4.1.8.)

When running the separate Java test case outside of Lucee, it just works (obviously).
To run the Java-only test, compile and run test1.java inside the extracted directory:

javac -cp commons-collections4-4.4.jar:commons-compress-1.21.jar:commons-io-2.11.0.jar:log4j-api-2.20.0.jar:log4j-core-2.20.0.jar:poi-5.2.3.jar:poi-ooxml-5.2.3.jar:poi-ooxml-lite-5.2.3.jar:xmlbeans-5.1.1.jar test1.java
java -cp .:commons-collections4-4.4.jar:commons-compress-1.21.jar:commons-io-2.11.0.jar:log4j-api-2.20.0.jar:log4j-core-2.20.0.jar:poi-5.2.3.jar:poi-ooxml-5.2.3.jar:poi-ooxml-lite-5.2.3.jar:xmlbeans-5.1.1.jar test1

My main question is probably this: Is the workaround with the Log4jContextSelector variable considered sideeffect-free?
I don’t quite like the thought of having to configure it on every application server just to be able to use one library, so a “real” fix would be awesome.

Thanks a lot for looking into this!

dunno if this makes any difference, but Lucee 5.4 already bundles log4j 2.20.0

Interesting find…
I probably should’ve been more clear on the NoSuchMethodError exception which occurred on 5.3.9.141 but not on 5.4.1.8. This one didn’t have anything to do with Log4j, as far as I can tell from the stack trace.


But I guess it could still be related somehow.

Edit:
Giving it some more thought, I’d guess that it’s related to the version of org.apache.commons.io bundled with Lucee, and Lucee’s package taking precedence over the context attribute in createObject()? But that’s mere speculation on my part.