I spent the weekend on making a few small changes to Lucee CFML compilation so I can run some of my Java code way faster and more directly without reflection or Lucee extensions. I was able to circumvent the need to wrap everything in a fld/tld and bypass the overhead of the Lucee bytecode to have my own direct path to Java. I can also guarantee this code is safe since I wrote it, so I don’t have to let Lucee keep checking if direct access is allowed like it does on other Java integration. I needed a better bridge to Java essentially so I can make more of my Java not have to depend on Lucee. I want to make my Java depend on a loosely coupled interface based module instead, so I don’t have to look at the thousands of Lucee files all the time in Intellij. If I only used Lucee extension, I wouldn’t have been able to make it as high performance and simple to extend because Lucee doesn’t generate direct access bytecode for Java in Lucee 5.3.
I figured out how to upgrade ASM jars from 4.2 to 7.0, and then changed the Classwriter to use COMPUTE_FRAMES and updated all the V1_6 references to V1_8 so that Lucee generates valid Java 8 class file format for compiled CFML code. I needed that to be able to use invokeDynamic or other new things later. For the specific feature I wanted, I don’t even need invokeDynamic of any kind of reflection at runtime though.
I was able to add a new global scope to Lucee called the Jetendo scope, which is going to be my bridge between Java and CFML code.
Instead of allowing the scope to function as a hash map like all of the others, I have modified the bytecode to be able to directly translate the CFML code into direct Java field and method calls that have the least overhead possible. Like no reflection and minimal casting at runtime.
I know it is working because I can see the decompiled asmified bytecode is greatly simplified compared to what Lucee usually generates and I still get the right output in CFML.
Then I benchmarked the new direct bytecode compared to the normal CFML code for reading fields and calling functions.
Fields can be read twice as fast with the new bytecode.
Java methods can be called 9 times faster then normal CFML UDFs from CFML now.
I haven’t tried to work with function arguments or more complex objects or write operations yet. So it’s just one level deep on read tests so far. like jetendo.callFunction() or jetendo.myField.
I don’t know what the reason is for Lucee to always do reflection on Java objects at runtime instead of compile-time, but there is a huge performance benefit to doing that reflection at compile time instead. I think I could patch/replace the way that reflection is done for all scopes at some point with similar code to what I wrote for the Jetendo scope. Perhaps if the Java class changes at runtime and there is a new interface, it becomes necessary for all the CFML code to be recompiled to be able to be correct and that’s why it is done at runtime. I’d be willing to create a way of dealing with that via a special call to invalidate the cfclasses folder, instead of making all CFML to Java access 2 to 9 times slower all the time though.
Also, the Java compiler typically converts String concat calls into StringBuilder calls when you write Java. I noticed Lucee bytecode doesn’t do this yet, so every time we use the & operator in CFML, we are making another copy of the string, which gets progressive worse the longer you do that. If you break these into separate echo() calls, you no longer have the memory waste. I wanted to determine how much faster a stringbuilder approach would be compared to concat to see if there is anything to gain from changing this. I did 4 concatenations on each loop in each way. The normal Java style was up to 5 times faster then the Lucee & operator bytecode. On my Jetendo scope, I’m going to create a way to use stringBuilder objects for output and creating names buffers to make it much more memory efficient. We can of course use arrays in CFML to make it faster too. I might try to figure out how to apply this back to all Lucee concatenation bytecode someday to speed up everything. Then we wouldn’t even need to mess with arrays or other tricks because Lucee compiler would automatically use stringbuilder array.
Another thing I want to improve in the Lucee bytecode is to figure out a way to store local variables for all those pagecontext function calls in the bytecode. In a loop, Lucee has to do 3+ extra Java function calls on average for each 1 thing happening in the CFML. If it was able to reuse a local java variable, the bytecode would much more efficient. It would be especially useful to apply this optimization before loops since that is where it would benefit the most. Now of course, hotspot might figure out how to optimize some of these, but I think the bytecode should read more like a simple Java class, instead of being so heavy on function calls. I understand it needs to refer to PageContext at least once for many things, but not thousands of times in a loop. We could track in a lookup table of these new local variables so we have the right int offset for ALOAD, etc. I can see how it makes the bytecode easier to write when you aren’t tracking locals, because you can just chain everything together, but tracking locals might be able to be abstracted more similar to how you track the string key names already at the bottom. like loadScope(scope), loadPageContext(), loadFunction(name, args), loadPageContextImpl(), etc instead of the basic ASM calls. These functions could check if they were already executed in the current udfCall() or not. The first time the function is run, it could append its localVar=function call to the top of udfCall and then only load the local back to the top of the stack at the actual line number we’re on to guarantee they don’t happen multiple times in a loop. Each additional call would just find that the local variable int offset, and load that instead of running the function again.
If a cfml user is willing to sacrifice being able to add/delete methods in the CFC dynamically (perhaps via a compiler option), you could also cache all the getFunction calls in a local variable to speed them up quite a bit especially if the arguments structure is the same on each call. getFunction is a complicated thing to replace, so it was easier to just bypass it for now.
It was super hard for me to figure out the ASM that wouldn’t crash with internalerror / classformaterror / arrayindexoutofbounds or verifyError. At one point, I wasn’t boxing boolean to Boolean and it expected to be able to cast to Object, but the error was nothing like that making it super confusing. If I was working on JVM or ASM, the first thing I would do is change the names of these errors and provide some human readable output, instead of the non-sense it gives. Fortunately, once the bytecode works, you unlock a serious performance boost by doing the least operations possible.
It is really cool to be able to generate my own version of what CFML code does.
I got so stuck on some of this that I haven’t had the time to setup a good use case of invokeDynamic yet in Lucee core. InvokeDynamic seems to really only be useful for when you have to evaluate types via reflection or other lookup methods at runtime because you don’t know what something is until runtime due to the fact you can pass everything around in CFML as “any” type. I’ll have to do that for CFC objects since we don’t know what they are most of the time. invokeDynamic should be up to 9 times faster for CFML method calls. InvokeDynamic has slightly more overhead, and it would always be normal slow speed on the first load, which might make invokeDynamic not so great if you want to use a lot of transient objects in CFML compared to a compiler option that would force static/virtual/interface calls until the class cache is flushed.
I also want to merge the CFML arguments and local scope and eliminate the concept of arguments as a separate object throughout Lucee core. This would also make function calls have less overhead, and make the CFML code simpler and more natural to write. I’m always having to type myvar=arguments.myvar to get rid of having to type it a bunch of times because i treat implicit scope accesses as errors. I was going to make arguments an alias for localScope so the compiler could prevent breaking existing code, but it would have zero runtime overhead since it would write the bytecode as var1.localScope().get(key) etc.
If we could inform the Lucee compiler the exact CFC to reference somehow (like com:path.to.com), we could further optimize runtime performance to have direct java function calls on those objects. This would be like making a TypeScript version of CFML. I learned how to build the Intellij CFML Support plugin the other day so I could explore creating new syntax and code completion concepts in both the plugin and Lucee core so that I don’t frustrate anyone with code the editor can’t understand. I thought I could even create a fake type system that Intellij would understand, but Lucee compiler would just ignore. This would give us code completion with CFC accuracy even for cached CFCs, and that would make Intellij an awesome tool for CFML developers since my #1 complaint about continuing to write CFML/Lucee is the lack of tools that match Java tool features. Currently, only Sublime Text is suitable for Lucee because it can do decent with “all autocomplete” plugin on a big project, but Intellij would increase the accuracy dramatically and save real time and make CFML feel productive like Java and Typescript in the Jetbrains IDEs. If I took that approach further and generated bytecode that was cfc type aware, cfc method calls would be 9 times faster without needing to use invokeDynamic to do it. This would be the most amazing upgrade for CFML to have cfc type aware code completion in the IDE and Java-speed object calls. I figured we need a way to import cfcs somewhere in the current file in order for the IDE to understand abbreviated names. And the IDE also has to understand the absolute paths of all mappings. It would also have to support editing mappings per project to let more dynamic ones work too. The current plugin doesn’t have any of these features. It only understand a component if you create it in the same file. You can make it aware of your cfcs by naming a bunch of them in a function you never call as a hack. It would be better if we could get the alt+enter code completion for importing cfc types, and it could search to find them like Java can.
Additionally, it would be cool if CFML structs could be typed somehow, to avoid having to make a CFC for everything since CFCs are heavy and forced to be in separate files. I could get this working at the IDE level at least, just to make it easier to call functions that have more complex arguments. Like a type definition file behind your application to extend what the IDE looks at when analyzing the code you write. Again, like typescript for CFML.