Performance: Switching to Java's Regex Engine

New recipe up on the docs site covering how to switch from Apache ORO to Java’s built-in regex engine:

TL;DR — Add this.regex = { engine: "java" }; to your Application.cfc (available since 5.3.8.79).

Apache ORO was retired in 2010. Java’s java.util.regex is faster, actively maintained, and gives you modern features like look-behinds, named groups, and possessive quantifiers.

It’s not a drop-in swap though — the recipe covers the main gotchas:

  • . no longer matches newlines by default (add (?s))
  • Backreferences in replacements change from \1 to $1
  • $ in replacement strings becomes special
  • ORO-only stuff like \u/\U...\E case modification has no equivalent
  • Curly braces are stricter

We switched the docs build itself over and saw GC parallel phases drop from 244k to 66k and a ~10% build time improvement, mainly from eliminating all the per-call ORO object allocations.

1 Like

Is there a way for a developer to know which engine is currently enabled at runtime? It would be nice to have a function (eg getCurrentRegexEngine or something).

Just thinking in terms of a third party library author, how could I write portable code? I wouldn’t know if I should use “\1” or “$1” for example.

Good question, yes you can check via

Doc’s updated!

Tho github is having a moment

image

https://github.com/lucee/lucee-docs/commit/245e19109b517ae72ae7d86df1e8f17fd4010055