Java 16 performance boost and Lucee mods

brucekirkpatrick · June 11, 2021, 5:03am

I just got motivated to work on more Lucee modifications in my fork. I found it really easy to update from an older version of Java 11 to the latest and deployed that to production, and everything works fine. Java does a good job not breaking things.

So then I took a look into using Java 16 today.

I found it trivial to upgrade from Java 11 to Java 16 after the work I did previously in 2018 to make it use more of the JDK library instead of the custom code, and to upgrade the bytecode ASM versions.

I have maven setup to compile to java 16 instead of 8 like the official version of Lucee. People might not realize that Lucee 5.3 is still targeting Java 8, despite running on the Java 11 virtual machine. I don’t know enough to understand if this would limit performance in under the hood optimizations, but I think it does.

<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>

I haven’t experienced any problems with my version of Lucee for 3 years now, and it’s pretty much immune to security issues since there is no surface to attack for that.

My Lucee is 30,000 less lines of Java, and contains none of the admin or CFML code, so it’s about 30% smaller project. I wanted it to be small to reduce the surface area to only the features I want us to be using, since the majority of adobe framework features are trash and I also don’t allow any of the direct Java / execute stuff to run in my version, it was deleted entirely. I use other tools to fill in the gaps where you need system access.

I didn’t see the point of continuing to have a OSGI build process when it is actually easier to do the Java build directly. On the weekend I found out that I could just drag the loader project into the core and then get rid of OSGi pretty quickly. It only took a few hours to make my first working build without OSGI. I think OSGI was good for community project, but bad for what I want to do since I just want to hack on it quickly and not worry about distribution or modularity. I actually converted the project back to how it was with Railo, where you could build and run the project directly in the IDE. I modified the CLI script so I could call my CFML code directly in the IDE so I don’t even need a server running to test my changes.

My version of Lucee has a static Java configuration instead of the admin stuff, so none of the XML parsing or felix OSGi jar loading occurs. And its a lot more secure since the admin options can’t be attacked when it doesn’t exist. I don’t have an internal method to update those settings either, I deleted all of that.

I did quite a bit of testing with parallel loading of jars and optimizing felix before, and I had Lucee loading pretty fast, but when I got rid of felix OSGI, I had Lucee loading in about 650ms. But when I compiled it as Java 16, it was able to load in just 350ms. That was pretty exciting, because that means a larger CFML application will be able to run significantly faster on Java 16.

It’s really cool being able to build and see a new version of the language load in the IDE in just a second or two now. I had my custom Lucee build down to just 20-30 seconds previous to getting rid of OSGI, but now it is pretty much instant. I think Lucee should have some kind of rapid dev entry point and then OSGI on the side. You could have logic that makes the whole OSGI system optional for testing your changes. It’s a bit mental to be waiting minutes for every change. Maybe more people would try it out hacking Lucee if it just ran straight away in the IDE again.

When I click Run in Intellij IDEA, lucee cold boots without tomcat directly into the dummy servlet request and runs my first HTML request in about 400ms if I haven’t changed anything. On a production server, it may be even faster.

My goal is to deploy a version of Lucee without a big servlet container like Tomcat soon. I have a lightweight http parser I made on top of Java’s async socket channel, which is super fast. I’m going to use this like a firewall for Lucee, so that it can’t get overwhelmed by too many slow threads. I’m going to mark which urls are slow/fast and try to manage it so that there are always fast threads that can finish quickly.

I also updated all the dependency jars to the latest version and that seems to work so far.

I still defend using CFML and improve our application. I really think its great how I can further optimize our application by just making more tweaks to the Lucee Java and maybe start to add Java features. I think my version of Lucee is about 4 times faster because not only did I modify how it works, but I wrote my CFML application to take advantage of the most efficient bytecode possible, which is to always do explicit scoping for everything. That work I did on the bytecode optimizations in 2018 was really hard, but it works great.

I was getting like 1500 requests per second with a simple query and 7000 requests per second with a one line cfc request that also has application.cfc on my quad core intel 4790k cpu in windows. A newer cpu could do even more.

I’m going to integrate my custom Java web server into Lucee soon and look at modifying how queries work to see if I can squeeze anything more out of that. Queries are the biggest drain on performance. I already use the Lazy feature as much as possible and avoid running a lot of queries.

I also modified Lucee to see what happens when I call CFML code directly without going through the normal application listener flow, and it runs about twice as fast like that. I can get as low as 0.15 ms per request if I bypass application.cfc for a hello world type test. I think if I discard the application.cfc behavior in Lucee internals in favor of something that doesn’t have to keep running to maintain the configuration, that I could get up to 50% performance improvement on the simplest CFML requests while still maintaining the state of the application. Maybe it has to do with just needing to load 2 page contexts instead of 1. Maybe I could do something to keep it static and “cheat” the load process, since it is clearly much faster if it doesn’t have to load application.cfc at all. I think application.cfc got designed in a way that makes it waste resources on each request that are generally meant to survive the life of the server. I just need a way to reload the configuration on code changes, and other times, it should never need to change. If I could just deep copy the request configuration and rely on a static cache, it should be faster then running lots of individual CFML commands.

I also figured out how to get my first cloud 3 server cluster working with our application, and I can see Lucee will work just fine on that. I implement my own session syncing instead of using the built in session scope, because I wanted to control when the data access occurs. Using cookies, I made it where the client tells the server if it is out of date, instead of the opposite. I’m not sure if Lucee does that if you use session in database, but my approach has nearly zero overhead on each request, since it lets me continue to act like the session is always in shared memory yet I still write sessions to the database if they change and replicate them so that the state is not lost if failover occurs.

I like to eliminate queries and disk i/o a lot, and this is even more important if you are doing cloud database replication like with Mariadb Galera, so I wanted to continue to have an application where dynamic requests can load without running queries, and some sticky load balancing that can still handle failover. Nginx was really good. I was able to make an excellent configuration on digital ocean in their smallest systems for just $15 per month for 3 servers, but then I upgraded it to the next one, since it’s a little too tight for mysql and lucee at 1gb ram. 2gb is enough for testing. That’s pretty darn cheap and the performance with the speed of Lucee and Java is excellent. I will probably run much faster systems in production, but it’s pretty impressive how efficient you can make things if you are careful like I am to avoid disk I/O.