Building a highly scalable clustered web app

I hope this is a good place to start this discussion. If not, I’m happy to remove this post.

I’ve got an app that’s currently running on AWS that I would like to make highly scalable in horizontally clustered environment. I’ve got my VPC all setup and the web server resides in a private subnet behind a public facing load balancer. No data resides on the server.

My next step is to ensure that my session management code is cluster-friendly. I would like to know what people think in terms of what kind of patterns I should follow when managing session/client data, and what tips/tricks and configurations have been proven to work with Lucee apps.

The app is currently running on a single Windows Server 2012 R2 instance, but am planning to migrate over to Linux. As a first step in that transition, I’m planning to replace IIS/Boncode with NGINX, which I don’t think it will be a problem since we currently have very low traffic (NGINX on Windows is still considered beta and is not meant for large scale apps).

I want to avoid sticky sessions at all costs if possible. I was thinking that I would create a standalone EHCache node to handle sessions and client data.

The app will also utilize websockets, and I expect to have at least one dedicated websocket server. I’m planning to use Igal Sapir’s extension for this, though I understand that it’s very much in the development stage.

What about my code? What patterns do I need to avoid or adhere to? Do I need to lock the session before all writes? Are there any application.cfc considerations with regard to onSessionStart() and onSessionEnd()?

Right now, my code copies the user’s season data to the request scope in onRequestStart() and then back to the season scope in onRequestEnd(). During the course of the request, all reads/writes to session data is done via methods that read or modify the request scope’s version of the session data, which will then get copied back to the session. I can see some drawbacks to this because you would need to merge your session data back to the session and not overwrite it.

I’m guessing I’ll need to use RabitHQ or something similar to manage session state between web servers and websocket servers, and to broadcast websocket messages from web servers to the websocket servers.

Any thoughts or ideas are welcome and appreciated!

1 Like

Have you considered a Docker based solution running in AWS instead? We’ve moved from EC2 instances running apps to EC2 instances running a multi-AZ node cluster and deploying containers.

You can use an AWS Elasticache memcached store for your Lucee sessions (and flag it as multi-az too).

Store as little as possible in the session scope itself. You just want the session to allow you to identify an authenticated user as they move to different members of the cluster.

We turn on sticky sessions in addition to having a central session store. You can develop strategies to try and maintain unchanging “session state” data on the instance the user first binds to. And look to refresh their “session state” if they get flicked to a different member of the cluster.

I’d try and avoid locking if possible. You’re not going to corrupt the server like the pre-Java days. The only danger is race conditions on your session data. Depending on your app these rare occurrences may be completely benign in any event.

When in AWS you might consider the various message queue services AWS offer rather than going to the trouble of building and maintaining your own.

I’m familiar with these terms but I haven’t used any of this technology yet. I definitely want to build disposable machine images and script their deployment, if that’s what you mean. I think I need to dive into Docker. I’ve played around with Packer and Vagrant a bit.

What is a multi-AZ node cluster?

Docker containers are the endpoint on the continuum for “disposable machine images”.

Multi-AZ is short hand for “multiple availability zones”. In AWS speak these are different data centres within the same region. For example, Sydney point of presence has three separate data centres.

Ah yes of course! I do have my VPC built in multiple availability zones, but obviously with one server, I’m not taking advantage of it.