I recently deployed an application to a production environment and have started getting a few random errors, mostly from bots (google). The errors are such that a particular function is unavailable, or a variable is not defined. The problem is that I’ve not seen this issue in my development environment, and the same code is working fine 95% of the time. I’ve gone through and checked for scoping issues, and am fairly confident that this is not the problem (could be wrong). I can reproduce the error in the production environment by refreshing a page repeatedly extremely quickly, it can take up 50+ refreshes for the error to pop up.
What other things should I check that might introduce random and infrequent errors that should not technically be possible?
Also - FYI - this production environment is set up with a front proxy server that load balances to 2 Lucee servers. We have session storage enabled in a database for sharing session across the servers. Session sharing is all working fine. I don’t know if the load balancing contributing to the problem, just trying to give a better picture of what may be going on.
Without seeing the offending code… or a stack trace… the best any of us can probably do is take a stab or two in the dark here.
First, this sounds like a thread safety issue - in short one thread (request) is overwriting/overriding the values/state of another thread (request). Identifying the culprit and using locks can help this, or in some cases it may require rewriting the code to avoid the thread safety issue altogether. Again, though, without access to the code and/or information on how the code is being processed it would be hard to conclude what the root issue is or the best way to solve it.
That said, if the code in question does not produce results that you need to be indexed by search engines, et al, then you can simply block that page in robots.txt and probably temporarily solve the frequency of the errors. This might, at least, buy you a little extra time to isolate and fix the root issue while not causing public facing issues - however as the load increases on your servers the issue will surface more frequently so this is merely a band-aid suggestion.
And, finally, I’d install FusionReactor and use it to help isolate where this condition exists and under what conditions it expresses itself. This, in combination with some load testing, may ease discovery of the root cause and save you from log digging and head scratching - and it can also help you identify other problems with code performance and errors in general and is handy to have on hand for such tasks.
Sorry I can’t offer more cogent advice, but there’s not a lot to work with here If you want to post the code and stack trace (in a linked Gist) then folks can put more eyes on it and maybe try and help.