Redis extension — 4.1.0.0-SNAPSHOT, NearCache Improvements

A 4.1.0.0 snapshot is now available. Rewrites the near cache layer, closes a cluster of session-storage and concurrency bugs, and adds a nearCache opt-out flag for multi-node deployments. (Source: PR #19.)

Background — what’s the near cache?

When your CFML code calls cachePut( key, value, ..., "myRedis" ), the Redis extension doesn’t go straight to Redis on the calling thread. It buffers the entry in an in-process map (the “near cache”) and returns control to your CFML almost instantly; a background thread drains the buffer to Redis a tick later. Subsequent cacheGet() calls for the same key are served from the buffer with no network round-trip.

The win is real — cachePut() returns in microseconds instead of waiting on Redis. The catch: that buffer has to be airtight under concurrent access. If two threads cachePut() the same key at nearly the same time, one has to win cleanly. If a put and a get race, the get must see the old value OR the new value, never something half-applied. The implementation on master got those guarantees subtly wrong.

The other catch: the buffer is per-JVM. If two Lucee nodes share the same Redis and the same data lives on both — most commonly sessionCluster=true without sticky sessions, where a user’s session can bounce between nodes — node A serves stale reads after node B’s writes. This PR closes the intra-JVM races (above) and adds a new nearCache=false opt-out for the deployments that need it — see Clustering opt-out below.

For a CFML dev, the bugs surface as:

  • cacheGet() returns the wrong object — sometimes literally a different object stored under a different key
  • Another thread mutating “your” cached struct silently corrupts everyone else’s reads
  • Redis-backed sessions dropping writes that your CFML clearly performed
  • sessionInvalidate() not actually invalidating — the next request with the old cfid still works
  • Session variables set inside a cfthread getting lost after thread.join()

Closes

Ticket Symptom
LDEV-4413 cacheGet returns a shared object reference — concurrent readers can mutate the cached value
LDEV-6327 Stale-wins on rapid duplicate puts — reads can see older writes for up to the drain window
LDEV-2135 Session variables set after thread.join() lost on next request (sessionCluster=true)
LDEV-4408 application action="update" destroys session variables under Redis storage
LDEV-6046 (Redis path) sessionInvalidate() doesn’t remove the session key from Redis
GH #13, #5 Same shape as the above

In production: occasional lost form data, last-write-doesn’t-win on session updates, intermittent logout bugs, sessions surviving sessionInvalidate().

Performance

Bonus: the rewrite is also a lot faster on the same paths. The new map+queue does O(1) lookups where the old code did a linear scan, so the same change that fixed the races also picked up real headroom under contention. JFR confirms the hot path that previously dominated CPU under load is no longer measurable.

My local test bench suite runs in 55s vs 185s with the previous snapshot (4.0.1.3-SNAPSHOT).

Headline throughput numbers (ABBA, 100k cycles, 16 threads):

Scenario 4.0.1.3 ops/s 4.1.0.0 ops/s Δ
put-get-different 1,370 108,542 +6510%
put-get-same 4,421 112,829 +2452%
duplicate-put 4,815 83,795 +1640%
hot-key-duplicate-put 18,714 110,232 +489%
get-burst 2,679 10,261 +283%
put-small (uncontended) 99,436 113,000 +14%
get-hot (uncontended) 87,592 88,009 +0.5%

The multipliers measure the bug-path tax in isolation — synthetic 100k+ ops/s assumes the JVM is dedicated to cache calls. Real apps won’t see literal 60× speedup, but expect lower per-op CPU, ~10× less GC pressure, and better tail latency under cache contention. Uncontended paths are unchanged or slightly better — no regressions.

Correctness on the bug paths (deterministic testbed, 16-thread contention):

4.0.1.3 4.1.0.0
LDEV-4413 read aliasing (50 trials) 22 fail 0 fail, 50 pass
LDEV-4413 write aliasing (50 trials) 50 fail 0 fail, 50 pass
LDEV-6327 stale-wins (100 trials) 78 stale 0 stale, 100 pass

Bonus: clustering opt-out

New nearCache init argument (default true, existing behaviour). Set to false to disable the in-process near cache entirely — every get/put hits Redis directly. The LDEV-6327 stale-wins fix above closes the intra-JVM race; this flag closes the inter-JVM one.

Most users should leave it on. The default is correct for the common cases: single Lucee node, or multiple Lucee nodes with sticky sessions, or multiple nodes each using their own logical cache regions. The near cache is a single-JVM optimisation and that’s fine as long as the same data isn’t being written from more than one JVM.

Turn it off only if:

  • You’re running sessionCluster=true and your load balancer does not do sticky sessions — i.e. user requests can bounce between Lucee nodes mid-session. Each node’s near cache will lag the others’ session writes.
  • You’re sharing application cache regions across Lucee nodes and your app actually relies on cross-node read-after-write consistency.
  • (Future) You’re using a Redis Cluster topology — the extension doesn’t support that yet (tracked separately under PR #18d / LDEV-4579), but when it lands, nearCache=false will be the safe default for clustered backends.

Cost of opting out (16-thread bench, 100k cycles):

Workload near cache on near cache off Δ
hot-key-write-burst 67k ops/s 16k ops/s 4.2× slower
hot-key-duplicate-put 75k ops/s 28k ops/s 2.7× slower
put-get-same 73k ops/s 41k ops/s 1.8× slower
hot-key-put (unique keys) 64k ops/s 65k ops/s unchanged
put-small 73k ops/s 65k ops/s -11%

JFR confirms the gap is allocation pressure in the Redis response parser (0.8% → 12.5% of total allocation) plus pool contention (4× more thread parking events) — both expected when bypassing the batched async drain.