A bit of a strange issue that I wonder if anyone else has ever had. I moved my session storage yesterday from an Ehcache cache (Lucee built-in) to a memcached server hosted on AWS Elasticache.
While it worked seemingly without issue to start with after between about 45 minutes to 1 hour the Lucee server would simply stop responding to requests. There were no messages in any of the logs and the memcached server was still working ok. The only way to recover it was to restart Tomcat. Moving it back to the Ehcache and changing nothing else stopped Lucee from dying.
Any ideas? Anyone else use memcached for session store with Lucee?
Java 1.8.0_121 (64bit)
Do you have multiple web contexts or a single context for the server?
Mutliple, 4 contexts on the same server and the cache is set up at the server level.
Will check with the team but I’m pretty sure we had to roll over to database storage on a multi-context implementation as this was behaving strangely for us even in 4.5(?).
I believe it’s fine in a single context environment which might explain why it’s slipped through testing. We probably don’t run up against it for docker implementations as they are all single context.
We had a similar “multi web context” related issue a long while ago with Railo 4.2 and the Memcached extension, on servers which had about 20 web contexts where we wanted to use Memcached for sessions in each web context. However, I’m not sure that it is the same issue as yours…
The scenario we had was that the Memcached extension would be installed in the server, configured as the session cache in each web context, and it would be working fine for X hours (perhaps more than a day), then for some reason it would disappear from the Railo admin as if it wasn’t installed, and the applications would crash due to the missing cache.
If you configured a Memcached cache for sessions for a single web context only it seemed to be stable.
When you say the server “stop responding to requests” are you getting a timeout? Is CPU usage high and is the JVM healthy? (Might need to do a thread dump). Have you also tried to run these applications in servers with single web contexts to see if it still has the same issue?
We do use AWS Elasticache (Memcached) with our own caching plugin (uses a Java client rather than the Lucee memcached extension) and haven’t had any issues with stability that I can recall.
Just to expand on this point, we do currently use the Memcached extension in Lucee 4.5, configured as a session cache for applications which run in a single web context server, and we’ve had no issues.
These apps run as Docker containers across 3 Docker hosts, and although we do have the session affinity set to sticky sessions via HAProxy, we can bring up new containers and then bring down the old containers without loss of user session data. The uptime on these containers is typically several days to a couple of weeks depending on what needs to be deployed.
For reference, these are the files that we add to the Docker image;
/memcached/126.96.36.199.rep -> /opt/lucee/server/lucee-server/context/extensions/22E5066D7B123C5D4898C712C0438CFA/
/memcached/MemCached.cfc -> /opt/lucee/server/lucee-server/context/context/web-context-deployment/admin/cdriver/
/memcached/MemCached.cfc -> /opt/lucee/web/context/
/memcached/commons-pool-1-5-6.jar -> /opt/lucee/server/lucee-server/context/lib/
/memcached/lucee-extension-memcached.jar -> /opt/lucee/server/lucee-server/context/lib/
/memcached/memcached-3-0-2.jar -> /opt/lucee/server/lucee-server/context/lib/
Are your file versions the same, or perhaps older/newer?
Thanks Justin. Unfortunately, my scope to try things is a bit limited as I’ve only seen this issue on my production server, so I’m assuming it must be related to load, so I’m going to see if I can reproduce it in my development environment by applying some load to it.
With regards to what I was seeing, basically, the server would stop responding to requests completely. No timeouts, nothing in any of the log files, etc… Looking at the threads in
top I could see the java process for tomcat was running at around 99% of CPU, but apart from that nothing else.
Also, I’m wondering if configuring the cache within each web context, instead of at the server level might help. Well, at least that gives me a starting point for things to try. Thanks.