I did a new experiment tonight. I upgraded my Lucee build to the Project Loom build of JDK 17 to use the Virtual Thread (previously called fibers) feature they might release someday. It’s not scheduled for JDK 17 though even though it appears to work very well.
I made my Lucee cfml requests threads run as virtual threads and I setup my benchmarks to be able to turn this on and off and to have different levels of concurrency.
I found that virtual threads can finish up to 4 times faster then regular threads. And they can handle concurrent Lucee cfml requests very well, but it’s still better to have less concurrency since the performance degrades as you increase it. These are some numbers I was getting on my quad core cpu. I ran my trivial hello world script for 1 million requests.
10,000 requests per second with 10,000 concurrent lucee requests.
39,000 requests per second with 1,000 concurrent lucee requests.
79,000 requests per second with 100 concurrent lucee requests
83,000 requests per second with 10 concurrent lucee requests
There are more performance benefits to Virtual Threads within individual user requests since the performance boost is there even when there is zero load on the system.
I might make new CFML functions like this that bypass how Lucee does cfthread to take advantage of Project Loom. I believe this is the best way to do it based on how I see the bytecode is setup.
thread=startThread(component cfcObject, string methodName, boolean virtual, struct args); resultArray=joinThreads([thread]); // optional stopThread(thread);
Currently, cfthread depends on the body being the stuff that gets executed but virtual threads rely on Java lambda expressions and InvokeDynamic in the bytecode which is a different structure that I can’t deal with or don’t know how I’d do it. This is why I’m making the functions (not tags) that receive a callback function and the native java thread reference will be passed around. I think I can make it work super efficient this way without have to generate any weird CFML bytecode if I just make it possible to copy the pagecontext and directly call the method on the object and pass the arguments struct to it. I’d probably ignore the cfthread scope stuff and just operate on the struct that was passed in, in order to minimize the Java operations under the hood since the initialization of a regular cfthread is pretty intense in Lucee part. There are hundreds of places where faster virtual threads would be useful and it would be fun to write more things with parallelism.
The cost of cfthread and Java threads is so high in CFML, that it is rarely worth using them since you usually can’t beat the cost of the thread after testing it. But virtual threads will be able to show measurable improvements most of the time.
The new virtual thread threadpool executor is pretty weird since it supplies infinite virtual threads, and doesn’t let you limit how many. You can make your own threadpool logic to control this, which is what I did since the performance degrades badly if you send in thousands of new virtual threads without joining them back to the main thread.
I think JDK 17 (beta) might not be as optimized as JDK 16 (final), because I’m getting a little bit lower numbers. I think it’s more exciting to have 1 request finish faster and make the system more resilient to spikes by having lots of unused capacity. My production system is always idle because I have it so efficient already. JDK 17 is scheduled for September this year and it will be the next LTS so it could be years still before these preview features are officially finalized into a popular LTS release. It is a
shame project Valhalla and Project Loom are taking so long, but they will be amazing someday.
Project Valhalla would bring C++ level performance to Java classes without making it any harder to write them. The features added to Java in the last few years are pretty cool though. I’m using the new syntax and it’s nice how they are reducing how much you type in more ways.