Timeout on doJoin in Redis extension

jonshofman · December 11, 2020, 10:01am

Hey guys,

Hope you can help me out with an issue i have with the Redis extension.
We’re trying to move our the Cloud, using Google’s GKE platform.

One of the things we are trying to resolve is to create a persistant caching mechanism that can be shared among different pods in the cluster.

Our first attempt is to use Redis for that purpose.

This all seems to work fine, unitl we add load to the cluster, then the extension is running into timeout out on the doJoin function:

lucee.runtime.exp.RequestTimeoutException: request /index.cfm (/var/www/index.cfm) has run into a timeout (timeout: 50 seconds) and has been stopped. The thread did start 52012ms ago.
at lucee.extension.io.cache.redis.RedisCache$Storage.doJoin(RedisCache.java:602)
at lucee.extension.io.cache.redis.RedisCache.getCacheEntry(RedisCache.java:121)
at lucee.extension.io.cache.redis.RedisCache.getCacheEntry(RedisCache.java:159)
at lucee.extension.io.cache.redis.CacheSupport.getValue(CacheSupport.java:103)
at lucee.runtime.functions.cache.CacheGet._call(CacheGet.java:96)
at lucee.runtime.functions.cache.CacheGet.call(CacheGet.java:42)
…

At this point we’re replacing different caching mechanism we already had in place, caching either in the Application/Server scope or in MySQL tables.

It is not uncommon for a single request to fetch between 300/400 keys from Redis.
Contents of the keys can vary between short strings, large chunks of pre-rendered html and (large) MySQL resultsets.

So we are running our Application on GKE with a Docker image based on the lucee/lucee:5.3-nginx image.

Installed the Redis Beta driver extension version 3.0.0.6-BETA.

We tried using the Google Managed Memorystore Redis service. Which is Redis 5.
But also tried to deploy a Redis 6 docker on a Google Compute VM.

Both Redis services seem to have the same problem when adding load.

Anybody experienced this type of issues with the Redis extension before?

Any help would be much appreciated.

Thanks

carehart · December 11, 2020, 1:39pm

Jon, something to consider is to monitor what’s going on inside of redis, during the times you’re experiencing the problem. (Or if you did, what did it show?)

Since redis is written in c, we can’t use fusionreactor or other Java monitoring tools (as some here may be more familiar with), but if you do a google search simple for monitor redis, you will readily find several options, from built-in tools to those which can be added (most for free).

As you (or other readers) may consider them, do beware that there are at least a few things worth watching, some more important than others depending on the problem:

redis metrics
system metrics
requests processed by redis

Different tools will offer some of these, while some may offer only one and not the others. Indeed, I’d argue from experience that the last one may be most important. Users of Lucee or CF might understand how when they’ve used a tool like FR, it can track every current and past page request.

With that sort of diagnostic you can see WHAT is going on in redis in response to your actions (or those of users), including WHAT statement is running, how long, and more. Sometimes much more, depending on the tool.

Sure, sometimes the problem IS simply a system resource allocation problem (running out of cpu or memory, for example), so watching those can be important also. And it may well be that some configuration setting in redis DOES need to be tweaked, for your load, or that nature of your redis equests, like their size and the frequency of calls in a request. (And maybe something can be done I your code, or Lucee, or the redis client config to help there.)

But it’s worth noting also that sometimes one may find that the hangup is not even in the thing you’re focused on (redis, in this case), but something else running alongside it, though I realize if you have a redis container you could reasonably assume that redis is indeed the “only” thing running in that container.

Anyway the challenge in such situations, of course, is to know which is the chicken and which is the egg (between resource constraints and request traffic). But sometimes you MAY find that the traffic is still MORE different than you expected, and it may even turn out that there IS some problem with the client.

And maybe someone here will spot your issue readily and propose a solution. If not, I hope what I offer above may help.

In doing such troubleshooting with users of Lucee and CF, I often help people see that shining a light into their “black box” is just the ticket to getting to an ultimate resolution. Otherwise you may find all kinds of proposed “solutions” on the web, with people proposing various knobs to fiddle with, which is instead groping in the dark.

Looking forward to hearing what more you may learn and if you may resolve things. I realize that could come without you needing any of the above. In either case, perhaps what I share may help others, whether with other redis problems or with still other server problems.

Zackster · December 11, 2020, 2:05pm

there are some unreleased improvements for redis

check out a copy and run ant to make your own build