I find myself playing with regex this week. I noticed that the engines in Lucee and ACF, for that matter, are using the Apache Jakarta ORO project. This project was retired in 2010-09-01, almost 7 years ago.
Are there any plans on moving to a more modern regex engine or even exposing the java regex system? Maybe we could introduce some replacement functions or have a switch to use one engine over the other.
This is something I would be curious about as well. While I’m very experienced with regex I have to be honest that I have not educated myself very thoroughly on the pros / cons of the different engines out there.
Do you know which engines you feel would be good replacement candidates and why? If there is a consensus, I would consider taking a stab at swapping the engines and doing a pull request as I am becoming more and more familiar with the lucee code. In my mind, the main reason for the swap would be more features / flexibility with regexes and perhaps a speed improvement if there is a significant one to be had.
Given that modern Java has a built-in java.util.regex which seems to have been built as a replacement for ORO anyway, wouldn’t it make sense to use it and lose a dependency along the way?
Where regexEngine is a new argument. This is inline with how the crypto functions work. It allows us to migrate from old tech to new tech for those who care. I think the supported engine should be java since that would be directly available and does not add any new dependencies.
Totally agree with your thoughts here, though regex has a lot of standardization. The biggest issue in my mind would be a new feature in an engine that implements the POSIX standard regex more thoroughly might require older code to have more escapes in certain instances, while over 95% of regexes would likely just work.
A setting in lucee admin is certainly reasonable to set the default at least. For something like regex I wouldn’t necessarily want to encourage switching engines in one-off instances with an argument, but that might just be necessary if the hit rate for problems is high.