Upgrade Regex engine in Lucee

I find myself playing with regex this week. I noticed that the engines in Lucee and ACF, for that matter, are using the Apache Jakarta ORO project. This project was retired in 2010-09-01, almost 7 years ago.

Are there any plans on moving to a more modern regex engine or even exposing the java regex system? Maybe we could introduce some replacement functions or have a switch to use one engine over the other.

Having look-behinds would be a nice addition.

6 Likes

This is something I would be curious about as well. While I’m very experienced with regex I have to be honest that I have not educated myself very thoroughly on the pros / cons of the different engines out there.

Do you know which engines you feel would be good replacement candidates and why? If there is a consensus, I would consider taking a stab at swapping the engines and doing a pull request as I am becoming more and more familiar with the lucee code. In my mind, the main reason for the swap would be more features / flexibility with regexes and perhaps a speed improvement if there is a significant one to be had.

Given that modern Java has a built-in java.util.regex which seems to have been built as a replacement for ORO anyway, wouldn’t it make sense to use it and lose a dependency along the way?

Simply changing the Regex engine is probably a no-go since that would have the potential of breaking legacy code.

But it is probably possible to add a “switch” that will allow to choose which Regex engine you want to use.

1 Like

I understand not changing the regex engine for current reXXX functions and tags. What we could do is introduce new functions or a switch like this:

reFind( regex, string, [,start] [,returnsubex], [,regexEngine])

Where regexEngine is a new argument. This is inline with how the crypto functions work. It allows us to migrate from old tech to new tech for those who care. I think the supported engine should be java since that would be directly available and does not add any new dependencies.

3 Likes

Totally agree with your thoughts here, though regex has a lot of standardization. The biggest issue in my mind would be a new feature in an engine that implements the POSIX standard regex more thoroughly might require older code to have more escapes in certain instances, while over 95% of regexes would likely just work.

A setting in lucee admin is certainly reasonable to set the default at least. For something like regex I wouldn’t necessarily want to encourage switching engines in one-off instances with an argument, but that might just be necessary if the hit rate for problems is high.

You could create a test build which runs both engines and record any differences to help develop a migration guide?

@bennadel blogged about this over the weekend

ACF 2018 Update 5 added the application.cfc option, this.useJavaAsRegexEngine=true

I have filed a new bug regarding this

Add this.useJavaAsRegexEngine, use java regex engine instead of old Oro engine
https://luceeserver.atlassian.net/browse/LDEV-2892

6 Likes

I’m not sure why ACF keep adding Application flags – seems like a hot mess to me.

This is an approach I could certainly get behind.