Rando Thought: Fail-safe attribute for while-loops

bennadel · December 15, 2022, 1:28pm

This is a totally random idea. Sometimes, when I’m constructing a while() loop in CFML, I’ll add some sort of a fail-safe during development so that I don’t accidentally kick-off an endless loop that crushes my CPU. Something like:

<cfscript>

	i = 10;
	failsafe = 1000;

	while ( i ) {

		i++; // ERROR: This should have been "--", loop will never end.

		// DURING DEVELOPMENT, while I am figuring out my loop logic, I might include
		// something like this as failsafe to make sure I don't crush the CPU.
		if ( ! --failsafe ) abort;

	}

</cfscript>

This way, while I’m still building-up my algorithm, I can rest easy knowing my loop will end no matter what (with an abort, not a false-positive). So, my thought is, would anyone be interested in have a meta-attribute on the loop itself. Something like:

<cfscript>

	i = 10;

	while ( i ) maxIterations = 1000 {

		i++; // ERROR: This should have been "--", loop will never end.

	}

</cfscript>

This wouldn’t be for production use - only for development (though, onus is on the developer to remove pre-deployment). And, if the loop exceeded the maxIterations count, Lucee would throw some sort ouf out-of-bounds error.

The Lucee language already has similar syntax with things like localmode:

public void function doIt() localmode = "modern" {
    // ....
}

… which is why this other idea occurred to me.

It feels strange to only have something that is development focused. But, on the other hand, I’ve only ever used loop times = 10 {} in R&D, never in production. So, not the craziest idea.

Zackster · December 15, 2022, 1:44pm

hmmm, actually we have a PR proposal which is related to this

github.com/lucee/Lucee

Adapt ByteCode to handle interrupted thread in compute loop LDEV-2573

lucee:6.0 ← xdecock:adapt-Bytecode-for-interruption-handling

opened 07:06PM - 16 Nov 19 UTC

xdecock

+325 -6

This is a quite aggressive approach to allow loops to be cooperatively pre-empte…d. Not sure it's ideal, the bytecode can more than probably be optimized, and the "tick" frequency should be adjusted. This allows to escape from computationaly heavy loops and avoid the stop() call. without this approach, a simple ``` <cfscript> local.a=2 do { local.a=local.a; } while(true); </cfscript> ``` Would kill the server for hours. I'm open for suggestion on how to make this more efficient.

bennadel · December 15, 2022, 1:50pm

Oh interesting. In that thread, it looks like people would actually use this in a production setting. I only assume this would be meaningful in development; but, it looks like it might have a broader appeal. Good to know.

bdw429s · December 15, 2022, 6:20pm

That pull request would simply make the thread possible to be killed by a tool like FusionReactor when it got stuck in an endless loop. (Interrupting a thread in Java assumes the thread is actually paying attention so it can stop what it’s doing).

@bennadel I’ve totally wished I had what you describe a few times. Normally when I’ve got some sort of endless loop bug I’m trying to track down and don’t want the code to keep hanging up until I’ve figured out the issue. I’ll often times just use an application-scoped counter to bail inside the loop as a workaround.

bennadel · December 15, 2022, 8:02pm

We used to have an issue with some PDF parsing, where something in the PDF would cause an infinite loop (internally to the JAR files, not the CFML). While we were trying to figure out what we could do, we ended up spawning it in a CFThread, and then using a join/timeout in the parent request where we would then programatically terminate the thread if it was still running.

The things we do to debug infinite loops

AdamCameron · December 16, 2022, 10:00am

Isn’t that - and any number of other situations where code might not exit - covered by the request timeout?

bennadel · December 16, 2022, 10:44am

Sure, but if I’m in the middle of developing a tricky loop, I don’t necessarily want to wait 30 or 60 seconds for my CPU to become available again so I can fix my issue. Keep in mind, I wasn’t pitching this as a “production feature” - this is something I envision using while developing.

AdamCameron · December 17, 2022, 7:31pm

30-60sec?! That is a hugely long default request timeout! In general you want a request fulfilled in an amount of time best measured in milliseconds (maybe low hundreds of ms), right? Why would you have your request timeout set to 30-60sec? Granted in dev resources are lower so things might run more slowly, but equally less data so… less work to do. Set it to 5sec and it’ll pull the plug on your bad loop whilst you’re still going “hey what the…?!”.

I think you might be manufacturing your own situation here.

I also think if you sucked it up and took a more test-driven approach to designing your code, the frequency at which you encountered this issue would reduce dramatically; as you’d likely implement the code for the exit conditions before you wrote the code for the heavy lifting.

Again: yer manufacturing yer own situation.

Also… moving on from testing and onto Clean Code… “tricky loop”?

Your… own… manufacture.

Still: it is what it is.

I think the situation you place yerself in might be solvable by you by creating a keyboard macro that inserts this when activated:

failsafe=1
while (true) {
    if (!--failsafe) {
        throw(type="UntestedCodeAlertException", message="Naughty Ben");
    }
    // write code here. Remove that shit above once you have working code
}

That won’t ever go into a random loop from untested code misbehaving.

I don’t think this is something Lucee should busy itself with dealing with. If there’s a symptom of “naughty code”, then I think it’s on the dev to… lift their game: think of strategies of how they can improve so as to not get into these situations in the first place. It’s not for Lucee to mollycoddle them, that I think is actually deleterious to the situation @ hand.

There’s a good podcast called Working Code Podcast that has covers topics like test-driven development and Clean Code. You could poss benefit from listening to it

:-p

bennadel · December 17, 2022, 8:32pm

billy-madison-a-simple-wrong

I’ll just say that not all loops are simple. Some loops are inherently complex - like traversing a node-tree. The vast majority of my loops are simple for loops for .each() loops. When do/while loops usually involve a dynamic condition that requires logic that isn’t always straightforward.

I’m curious to hear more about your request-timeout though. Some requests must take longer, like a bCrypt request which probably clocks in at over 1s due to algorithm. What do you usually set your timeout to? Are are you tailoring it on a per-request basis?

AdamCameron · December 17, 2022, 9:34pm

Bahahahahahahahaha.

It’s not wrong, I just think yer trying to solve the issue in ~~the wrong place~~ a less than idea place.

I don’t want poor suggestions with poor rationalisations making their way into Lucee. You did raise the question to get feedback, yeah?

See… yer loop control wouldn’t be a mess of inline expressions; it’d be some simple guard statements - which you design first - and probably second the evaluation of the exit-conditions into their own methods. Which when testing (part of why we test)… one can mock to make sure the logic’s all legit and works. Designing one’s code via test cases isn’t a concept that’s there just to annoy you (that’s just a bonus), it’s a well-informed strategy to avoid the situation you are falling foul of.

Yeah, exactly that. For general click->get->response web pages (or equiv via async calls) for the sake of UX you wanna make those quick, so you might as well set the global req timeout to be pretty low. That generally describes most requests in a web app and especially a web site, I think. Setting the response timeout low is a good guard strategy (like your loop control in a way) to make sure the dev is paying attn to UX expectations.

When one knows that processing is gonna take a while… then one pushes-out the timeout. But that’s the exception, not the baseline.

bennadel · December 18, 2022, 10:45am

I’m fascinated by this because I don’t think I’ve ever seen anyone talk about a request-timeout strategy. So, what is your default request timeout?

AdamCameron · December 18, 2022, 12:46pm

Oh god. I am currently gatekeeper of a very… erm… “legacy”… application. It does not reflect any of my own good practices, and whatever its request timeout is compared to what it should be… it is the least of our concerns ATM

On the previous CFML app I had a hand in before this one, for the front-facing stuff it would have been a coupla seconds, I think. I do not have a copy of the code to check.

I cannae vouch for the back-end apps at that previous gig, as I did not work on those ones.

bennadel · December 18, 2022, 2:10pm

It’s so interesting. I’m gonna need to let that sink in. My brain just keeps spinning off into all the reasons that a request might hang occasionally:

Garbage collection.
Database bringing new data into working memory.
“Noisy neighbor” queries that slow something down temporarily.
Fresh code being compiled into byte-code on-the-fly.

I mean, a few seconds is probably enough for most of these issues most of the time … I just need to give it a think.

Then there’s the fact that the cfthread tag is ruled by the parent request’s timeout. So you gotta take asynchronous stuff into account.

Anyway, much to chew on.

Phillyun · December 18, 2022, 3:43pm

While I haven’t heard/ read others explicitly saying it, in my experience: while loops are a code-smell.

At least for me, I often have difficulty in complex while loops having at least one sharp edge case that cuts me.

My habit as a result is to always first prefer a defined end (ending index using from/to, or looping over a defined array, strut, or query).
For self-referencing functions that can call itself from itself, (or a while) I pass the current iteration.
At the top is my insanity check/throw for the sharp edge-case. Sometimes this is 10 (a tree with 3 levels shouldn’t get to 10) other times this is higher.

Phillyun · December 18, 2022, 3:59pm

My default timeout is in minutes on my dev machine. Since FR is often unable to kill threads it’s not worth my time to see what it will/won’t kill (plus there’s no inexpensive option for dev FR so devs don’t get it).

I prefer development to be painful (for logic errors) and force devs to find / fix runaway threads. Finding this early in the process means it’s less likely to make it to production.

If (when) I get a few runaway threads, I force-kill the process and let the service auto-restart. A few seconds of time to restart vs more time to open FR and attempt to kill individual threads with several clicks per thread.

Even on production, it’s almost always faster to pull from the load balancer, wait a few seconds for existing requests to finish (the ones that will) and do the same. Thankfully this is extremely rare because runaway requests get caught early in the dev process.

AdamCameron · December 18, 2022, 5:03pm

None of that - even if they are legit concerns in prod - would warrant a request timeout of 30-60sec, would they?

But - yeah - there will be edge-cases to watch for.

AdamCameron · December 18, 2022, 5:35pm

All that, and have tests.

If one takes a test-first approach with these things, it’s really very difficult to write code that gets “tricky” (where the definition of that is “does stuff that’s not immediately apparent”).

Even if one doesn’t generally test… this is “exception that breaks the rule” situation.

bennadel · December 19, 2022, 10:34am

I definitely prefer a regular for loop for an .each() iteration; but, I don’t really have an issue with while loops. I think, by definition, they are more complex since the condition for “done” isn’t known ahead of time. But, that serves it’s place in some algorithms.

bennadel · December 19, 2022, 10:37am

I’m about ot start a new project. Maybe I’ll set the default timeout to 10-seconds and see what happens. Even if the vast, vast majority of requests finish in under a second, I’m not sure that I want to spend too much time thinking about it. 10-seconds feels like low enough to “save the server” and high enough that to give rando issues a bit more wiggle-room without terminating user-requests.

Zackster · December 19, 2022, 10:52am

why not live on the wild side

setting requestTimeout="1000000"