Any resources or pointers for writing Lucee code to support concurrent users?

justaguy · March 1, 2020, 2:54pm

Say, 50 to 100 concurrent users to start?
Thanks.

bennadel · March 1, 2020, 8:50pm

When I think about concurrency, the only real issue that pops to my mind is attempting to access shared-memory that is not thread-safe (like have two users try to iterate over a single Array - that can lead to deadlocks). Structs / Scopes in Lucee are inherently thread-safe, so two users should be able to read / write to a shared struct (at least from what I understand).

The more you avoid shared state, the less you have to worry about concurrency.

Is there something in particular you are thinking about?

justaguy · March 2, 2020, 12:18am

Ben, thanks for your insightful response. Can you provide some example code of “shared-memory” and “shared state” cases?

And yes, I’m thinking about “common” data read and write. For instance, we have a db table called “transactions” while a read query like “select * from transactions where userid = session.userid” ( for simplicity sake, forget about cfsqlparam tag for now ) should be ok with many concurrent users/queries and such data insert should also be ok since we have cftransaction to encapsulate them to make sql transaction sequential.

joe.gooch · March 2, 2020, 4:16pm

The question you’re asking is very broad. Entire books are dedicated to concurrency. Most of the concepts are language agnostic, but there will always be language specific implementation notes as well.

You should familiarize yourself with the vocabulary of the field.

locking - method to ensure serialized access
optimistic locking - “Most of the time we shouldn’t have a problem with concurrency, so let’s trust but verify - we’ll run the change and then verify it worked”
pessimistic locking - “Nothing can be trusted. Ensure I have exclusive locks, make my changes, and THEN other people can see what I’ve done.”

Methods of locking - semaphores, spinlocks, countdown latches, transactions, transaction isolation levels, mutual exclusion/mutex, reentrant locking

Objects doing “Pass by Reference” vs “Pass by Value”
Race conditions
Shallow copies vs Deep copies

As with everything in programming, it’s partially science and partially art.

First thing that’s important is that every aspect of computing has concurrency issues. You have MANY core processors, all doing things, sharing memory, cpu, networks, disks and filesystems, objects and data - and any one of those things could cause you a concurrency issue. The tools you use to approach that change based on what you’re trying to accomplish and protect.

Second, every language or system is going to have different tools to address the problem. Files have locks, Databases have transactions, Applications have memory-based locks.

There is no “one size fits all” technique to eliminate all concurrency issues. Getting it right is HARD. You have to know how the system works, visualize how all the data relates, and come up with your own best practices as to what is acceptable for your application. Race conditions can be REALLY hard to get right. And the nature of development is usually such that you try to replicate your problems in Dev first - which may not have the load and circumstances necessary to cause the problem.

Third, you need to be wary of over-engineering your solution out of the box. 100 users is not a lot. 100000 is a VERY different story. But if you have a storefront with a million products and your 100000 users aren’t going to be looking at or buying the same products, they’re all looking at different things - that might not be a problem. It’s the patterns of data access, tracking how the infrastructure responds and how the program is functioning, profiling long wait times and consistent, incremental optimization that will get you to the end result. If you try to engineer your 100 user site for 100000, you’ll never deliver it. And it’s a waste of time, with little benefit.

Programming is all about picking the best compromise between differing constraints. Let’s assume I’m going to design a garbage can to take my garbage to the curb. It needs to be light, so I can lift it. But it needs to be heavy, so it doesn’t blow away in the wind. So I’ll make it… what? Some balance of the two. Or I’ll err on the side of heavy, and add wheels. Programming is no different. But building a motorized, titanium garbage can that can dump itself just in case I need to shuttle 100 bags of trash to the curb one week isn’t going to be very cost effective.

This is going to be long, so consider this part 1.

joe.gooch · March 2, 2020, 4:16pm

Part 2, let’s talk about CF specifically.

In general your concurrency problems will happen when multiple threads try to change the same thing. The readers (threads trying to read data) need to have a consistent view when they do their read. You don’t want to end up reading out of date data (or do you? if so, how far out of date?), and sometimes you don’t want all the readers to wait in line while one writer (threads trying to modify data) are doing their job.

In CF, your concurrency issues are going to be:

Persistent scopes that cross request boundaries
Problems introduced by cfthread
Algorithmic problems.

Let’s talk scopes!

Request, URL, FORM, CGI - these are all request specific scopes. You can’t really have readers and writers hitting these things simultaneously, unless you do something dumb like, Session.blah = Request. CFThread, however, could introduce issues, depending on if these scopes are passed by reference, shallow or deep copied… i.e. are these threads ACTUALLY accessing the same object(s) or are they accessing copies? (with the overhead of creating the copies when the threads start)

And then you have interesting questions like if I spawn a thread, I use the Request scope, the original request continues on to completion… what’s the thread’s Request scope look like? Is it a copy of the original? Is it THE original, just java won’t garbage collect it until the threads are done too?

In general, I avoid cfthread, but that’s because in my architecture those things would be done by other services and microservices, not on the frontend web servers.

But we were talking about scopes, soooo…

Application, Session, Server

These are persistent scopes. If you put a value in one of these, multiple threads will likely access it.(otherwise you’re using the wrong scope)

Ben mentioned that Lucee’s implementations are thread safe. And that’s GOOD, but it’s not going to COMPLETELY save you from yourself.

It DOES mean that assignments are going to (usually) be safe.

Consider:

In onSessionStart, we set Session.Value = 1;

if (rand()*10 GT 8) {
    Session.Value= rand()*100;
}

WriteOutput(session.value);

Aaaand then you mash on Curl or your browser many times to create concurrency.

The Session.Value assignment boils down to the java Session.put('Value', thenewvalue);, and the implementation in the scope is that essentially the get and put methods have locks around them so they’re safe - which means you aren’t going to get an UNDEFINED value.

But, you could have 10 threads that didn’t do an assignment and 10 that do. When I go to write the output, what has happened since the conditional and the writeoutput? I could have set value 20 inside the if, and ANOTHER thread changed it just after I did. So there’s no guarantee what I wrote in the if is what I’m going to output. That might be ok. That might not. Depends on what you’re looking to do.

What you could do:

v = Session.Value;
if (rand()*10 GT 8) {
    v = rand()*100;
    Session.Value = v;
}

WriteOutput(v);

This ensures the value you assign (if this thread assigns it) will be what you output in all cases. And it has the exact same number of gets and puts.

If that thing is a counter or balance of dollars, then your rules might be different. In my Session, assume Session.value = 0;

Session.Value = Session.Value + 1;
WriteOutput(Session.Value);

That’s going to count properly, right?

… wrong

Because this simple statement boils down to:

  tmpvar = Session.get('Value');
  tmpvar = tmpvar + 1;
  Session.put('Value');

And remember our LOCKING is happening in the get, and again in the put. So that means the increment in the middle isn’t part of the lock, there are TWO transactions, and time for something else to happen in the middle.

So you COULD cflock… either with a name, or on the Session scope. But remember that cflocks are memory based locks, they only work in your single instance, and you’re limiting your concurrency. Which means only one thread can ever run that code at once, and if it’s on every page, you’ll never reach your concurrent user goals.

What could we do? Well, we could leverage Java… On SessionStart:

  Session.Value = createobject("java","java.util.concurrent.atomic.AtomicInteger").init(0);

In my cfm

  WriteOutput(Session.Value.incrementAndGet());

Which means the VALUE of Session.Value never changes - it always points to the SAME object - immutability can solve concurrency issues without relying on locking.

It also means the object you’re using is BUILT to deal with the type of thing you’re doing. Incrementing a value, safely, so instead we have a single transaction that involves incrementing the value, persisting it and returning the count.

And what about subobjects?

  Session.User = StructNew();
  Session.User.FirstName = "Joe";
  Session.User.LastName = "Gooch";

  WriteOutput("Hello #Session.User.FirstName# #Session.User.LastName#");

If I run the assignment in onsessionstart,and write the output in multiple threads, that’s fine as long as Session.User doesn’t change. What’s happening here?

Well…

  tmp1 = Session.get('User');  // this is thread safe
  tmp2 = tmp1.get("FirstName"); // are these thread safe?
  tmp3 = tmp1.get("LastName");
  WriteOutput("Hello "&tmp2&" "&tmp3);

Are they?..

Depends - is StructNew() threadsafe?

Probably not. At the very least, it’s not part of the session scope, so it’s not going to know to lock on that. It MIGHT be. It MIGHT not be. Depends on the definition of the implementation. And it might be worse - it might change by version of CF, or from ACF to Lucee.

Even if it IS thread safe, if you run the assignment a second time, note that for some period of time

after the first statement - Session.User is an empty struct - which will cause the display to fail at tmp2… also note it’s a COMPLETELY DIFFERENT OBJECT - so even if the original struct was thread safe, I’m NOT modifying that struct, I’m creating a new one. So no implicit locking will work.
after the second statement, Session.User has a FirstName but not a Last Name - which will cause the display to fail at tmp3
After the third, we’re safe again.

Instead, if the assignment were this

  newVal = StructNew();
  newVal.FirstName = "Joe";
  newVal.LastName = "Gooch";
  Session.User = newVal;

Now I’m not going to have problems. The threads that access Session.User before I reassign it in line 4 are going to get the old structure with the old values. (because it’s passed by reference!) The threads that access Session.User after I reassign it are going to get the new structure. Either way, the structure returned to my display is consistent.

So yes, ASSIGNMENT and RETRIEVAL from scopes are thread safe, but you can’t say everything about the scope is thread safe. It’s all about how you use it.

Variables scope - DEPENDS

Why?

Well, Variables scope changes. If it’s a CFC, your Variables scope is the instance data for your object. Which means it’s persistent if the CFC is persistent, and it’s not if it’s not. If you put a CFC in Session scope, guess what, your Variables scope is Session scoped. If you put it in Server scope, it’s server scoped. Imagine if you’re saving a userid in the Variables scope of an Application scoped object - whoever logs in last will be the one shown on EVERY page.

But cfms are safe, right? I mean, it’ll be like a request… Or is it? What happens if I cfinclude a udf from a cfm into a CFC? Guess what’s now unsafe…

Some basic ground rules

cflock only works on 1 server. If you introduce a cluster, a second server, whatever, cflock only assumes atomic blocks on 1 CF server. It’s based on memory. If you need multiple instances, you need to look elsewhere for distributed locking - it could be your DB, it could be a memory or KV store like Redis/Memcached/Couchbase.
Always be careful using shared scopes. That’s where your issues are going to come from. Err on the side of doing your retrievals into local/request scoped variables, your assignments in local/request scoped variables, and using those instead of referencing Session over and over. “Get” early, and “put” late, and when solving a problem consider the multiple threads case.
Always local scope your variables in EVERY function. You never know when you’ll want that function used in a persistent area. Use tools, regex, etc to find anywhere this ISN’T the case, and fix it. It will be VERY hard to find if you trip over one of these in production.
Always refer to the ENTIRE variable name, with scope, instead of having your CF server guess. This is more efficient, first. Second, it’s much easier to read and see arguments.X is probably safe, Variables.x MIGHT be safe, Session.Y is probably a no no… if our function is something that should be thread safe.
Try to keep your code decoupled from the scopes its in. Use something like coldbox/wirebox to put your objects into the correct scopes, but write your objects WITHOUT referring to FORM, URL, Session, etc… unless that’s the SPECIFIC purpose of the CFC, you shouldn’t be doing it. And even then, the proper way to persist something (i.e. username) would be to create a UserProvider in application scope, for instance, that references a UserContext in Session scope - and let wirebox do the dependency injection. The code of UserContext should just a bean that holds the data and has no reference to scope. Same with the Provider. This allows you to tweak your architecture later without touching LARGE SWATHS of code.
Avoid cfthread unless you need it. If you do use it, be sure you know the implications of locking and shared objects. (I’ll let someone who uses it fill in those details)

And always remember you’re going to be testing it a 1 user environment, and deploying to a multiple use environment, so testing will be an issue.

Quick addendum - not all concurrency problems are threading related.

Consider deleting items from an array, you might do:

  <Cfloop array="myarray" index="val">
    <cfif some condition>
      <Cfset myarray.delete(val) />
    </cfif>
  </cfloop>

This, again, is implementation specific - what java does under the hood is it creates an Iterator to iterate over the array. This usually means a pointer to the next index in the array. But if you delete something, that pointer now could point to an invalid location. Java prevents this by having a counter in the array, that gets incremented when sets and deletes happen - the iterator checks to make sure the counter hasn’t changed when it goes to the next value. If it has, it throws a concurrent modification exception. It doesn’t track whether it’s threading related or not.

So you CAN having concurrency issues without multiple threads.

Proper way to do this would either be to

copy the array/keyset first, and loop over that, so you have a separate copy while you’re deleting from the original. (i.e. think Duplicate, or if a struct, StructKeyArray or StructKeylist)
If it’s an array, use indexes, and count backwards.

  <cfloop index="idx" from="#ArrayLen(myarray)#" to="1" step="-1">
    <cfset var val = myarray[idx]
    <cfif some condition>
      <cfset myarray.delete(val) />
    </cfif>
  </cfloop>

In this case, no iterator is created. We count backwards because then there’s no case in which our pointer will be invalid.

joe.gooch · March 2, 2020, 4:35pm

Part 3 - Databases

I’ll come from the viewpoint of MSSQL because it’s what I know the best.

Transactions aren’t enough. Transactions behave differently depending on the isolation level. The default is READ COMMITTED. Different databases deal with concurrency differently - some are optimistic, some are pessimistic.

What’s the difference? Let’s assume an account balance example.

  <cfquery name="tmp1">
    select * from Accounts where AccountID=?
  </cfquery>
  <cfset bal = tmp1.balance[1]- 10 />
  <cfquery>
    update Accounts SET balance = #bal# WHERE AccountID=?;
  </cfquery>

This is bad. This is EXACTLY the counter example in the last part. If your database server is pushing 1000 batch requests a second, that’s A LOT of things happening between the first and second query.

Will transactions help? Maybe. Depends. In MSSQL, you’d have to wrap the whole thing in a transaction, run it in READ COMMITTED or SERIALIZABLE, and add a UPDLOCK, HOLDLOCK to the select statement - to tell SQL you’re going to write to that value that you just selected.

But ultimately instead you just want to do it in one operation:

  <cfquery>
    update Accounts SET balance = balance - 10 WHERE AccountID=?;
  </cfquery>

This would be similar to the “increment” approach in the last section.

Now if you want to show this balance - does it HAVE to be the one you just updated it to? Or could it include other transactions?

the “IncrementAndGet” approach would be this

  <cfquery name="tmp1">
    DECLARE @ROWS INT;
    update Accounts SET balance=a.balance-10 OUTPUT inserted.balance WHERE AccountID=?
  </cfquery>
  WriteOutput(tmp1.balance[1]);

Why the declare? Well… Some versions of CF look at the SQL statement, see UPDATE as the first command, and flip into a different mode, specifically “I’m not going to get a resultset from this, I’m going to get a count of affected rows”… And you end up not getting your result set. This may or may not still be an issue, you should test for yourself.

Now lets consider transaction isolation:

READ COMMITTED - I only ever want to see up to the minute data. All my readers will block if a write is in progress, until that record is available. If it’s a table lock, that’s a LOT of blocking. If it’s a row lock, it’s more contained… but SQL also does lock escalation, so changing 5000+ rows means it’ll take a table lock instead… So even that can have gotchas. Multiple readers can read the same rows at the same time as long as a writer isn’t in the way.

READ UNCOMMITTED - I can see in progress writes. This is generally a bad idea. If you’ve created a write transaction elsewhere, for instance, to write to 3 related tables, you don’t want to be able to pull the record from the parent table without the child data. You should see the data when it’s consistent - and this leads to dirty reads. Some people do this when locking becomes an issue. It’s generally a bad idea.

SNAPSHOT - This is slightly different. When a writer starts to write, it saves the OLD data from the rows modified into tempdb - so other readers don’t block. If the reader started reading before the update transaction started, it’ll return the old data, snapshotted at the time the reader started and up to the point when the updater actually commits. Once the update commits, the new data is returned and the snapshot is garbage collected once all reader transactions stop referencing it. This can be a really useful tool.

SERIALIZABLE/Exclusive - Readers AND writers block in all cases. Single, sequential access. This is usually something you want to use sparingly, in small areas of the code. (i.e. if you’re reimplementing AUTO_INCREMENT or identity fields for some reason)

Some databases use optimistic locking, which means it’s like snapshot… it won’t block, but they may just throw an error if the data is modified underneath you. MSSQL can do this with RCSI set in the database, I believe oracle does it by default.

Going back to our original (even though I’ve shown better methods)

  <cftransaction isolation="serializable">
    <cfquery name="tmp1">
      select * from Accounts WITH (UPDLOCK,HOLDLOCK) where AccountID=?
    </cfquery>
    <cfset bal = tmp1.balance[1]- 10 />
    <cfquery>
      update Accounts SET balance = #bal# WHERE AccountID=?;
    </cfquery>
  </cftransaction>

This is very pessimistic. I don’t need serializable. Everything will block while I’m writing. And we assume we’re going to have contention, hence we take a draconian approach.

Let’s look at optimistic.

  <cfquery name="tmp1">
    select * from Accounts where AccountID=?
  </cfquery>
  <cfset bal = tmp1.balance[1] />
  <cftransaction isolation="read_committed">
    <cfquery>
      declare @rows int;
      update Accounts SET balance = #bal-10# WHERE AccountID=? and balance=#bal#;
      SELECT @@ROWCOUNT AS ROWS
    </cfquery>
  </cftransaction>

See what happened? We assume the balance hasn’t changed, but we add that to the where clause to make sure. If the balance HASN’T changed on us, we do a write and @@ROWCOUNT returns 1. If the balance HAS changed on us, we return 0 and know it didn’t happen. Not shown above would be wrapping the above statement in a retry loop - if it’s returning a 0, you just run it again. If you try 3-4 times and it doesn’t work, throw an error.

Note the select isn’t even in the transaction - doesn’t have to be, because we’re verifying the results. That’s part of the “assume everything will go fine” optimistic approach. Locking takes work, and causes waits. These are not insignificant things. In the pessimistic approach, locking is part of every transaction. In the optimistic approach, you assume things will go well, and only spend the extra time on retries when they don’t. Whichever is better depends on your data patterns.

There’s a lot more. MSSQL has hints for READPAST. This can be great with queue tables - instead of blocking, it just skips over rows that are locked. If you’re doing a queue - that’s fine, a later poll will catch that in-progress write. If you’re pulling data from a table, having an update causing an entire record to disappear might be… undesirable. I.e. if it’s a users table, that user can’t login while you’re updating their “last login date”. Better to use SNAPSHOT or Read Committed in that case and let it block. By using SNAPSHOT, do you really care that while they’re changing their password, checks will use their old password until that transaction is done? Or their last login date is slightly off? Probably not. so SNAPSHOT then. If you have a stock ticker, is it bad if the data is slightly out of date? Do you want to hold up the stock ticker as you write data to ALL the rows? Probably not. Note that most news sites say the data can be up to 15 minutes out of date. SNAPSHOT isolation. Done.

If it’s something that MUST be atomic, like making sure they don’t spend more money than they have credit for, then you’re looking at something you need to enforce, and you need a more restrictive isolation level.

So not only do you need to consider these things, but you also have to use the right tool for the job.

i.e. adding a new user

  <cflock name="adding_user_#userid#">
    <Cfquery>
      insert into Table (columns) VALUES (myvalues)
    </cfquery>
  </cflock>

Does this save you? Well… only if you have one single server. Cflocks are memory based, remember?

If the userid is part of the primary key, or you have a unique constraint in the DB, you don’t even need a cflock here. Just trap the DB error if it fails and take corrective action. (That would make this an optimistic approach)

Hopefully that helps.

justaguy · March 2, 2020, 10:40pm

Great insights, Joe, much appreciated.
Let’s talk about cftransaction for Lucee for a moment. For the “Isolation level”, what’s its default (for the four options of read_uncommitted,read_committed,repeatable_read and serializable)?

joe.gooch · March 3, 2020, 3:05pm

Neither Adobe’s documentation, nor cfdocs.org, shows the default. It’s possible it’s database-driver specific, or influenced by the connection string.

I know in practice, in my environment, the default is read committed. You’ll also notice that snapshot isn’t an available option - because it’s MSSQL specific, not generally available in JDBC. That means in my environment I do

  <cftransaction isolation="read_uncommitted">
    <cfquery>
      SET TRANSACTION ISOLATION LEVEL SNAPSHOT

       ... other stuff ....
    </cfquery>
  </cftransaction>

Adobe’s implementation resets the connection isolation level back to default when the transaction ends, but ONLY if it doesn’t match the default - so I pick something else (in this case read uncommitted) so it’ll clean up after me.

justaguy · March 3, 2020, 3:22pm

Yeah both doc on the CFtransaction tag from both Adobe and Lucee does not indicate default value , Wil de Bruin chips in with the following:
According to Adobe

If you do not specify a value for the isolation attribute, ColdFusion uses the default isolation level for the associated database
which is Read Committed for SQL server, and REPEATABLE READ for innodb in MySQL