Scheduled Tasks are Timing Out Early

alex · December 13, 2018, 5:53am

I have several tasks that run for a really long time. For example there is one script that queries a big slow running view (in SQL Server) and saves a big result to a file. As the data grows the task started timing out. I have been increasing the timeout, but it is no longer helping.

This is the error:

Message: request /ScheduledTasks/saveViewToFile.cfm (C:\ScheduledTasks\saveViewToFile.cfm) has run into a timeout (350 seconds) and has been stopped.
Details:
Type: expression

It’s strange that it times out after 350 seconds. I have it set for 1000 seconds.

Timeout (in seconds): 1000

Just need a little more time here for Lucee. In ColdFusion my tasks run way way longer - as long as the number allows and never an issue.

I’d appreciate any insights.

Thank you.
Alex

Lucee 5.2.9.31
Windows Server 2016

alex · December 13, 2018, 6:43am

I found the solution in global Settings on the Request page.

Under Request timeout I checked
Request timeout in URL: When the URL parameter [RequestTimeout] is passed in the URL obey it

Now the scheduled tasks keep running

ddspringle · December 13, 2018, 7:33pm

And chewing through resources that might be slowing down other requests, perhaps. This change also exposes you to DDOS attacks. And I could go on with other issues this raises.

Gateways are a more appropriate way to handle data munging tasks. They don’t chew through resources nearly as much and don’t use the same thread pool as requests.

HTH

– Denny

dnando · December 16, 2018, 9:39pm

I’ve got a set of “data munging” tasks that I’m running through scheduled tasks, and it’s always felt wrong to do this through a web request. I’m really curious how to set up an Event Gateway to essentially query data in the database, loop over it, and conditionally modify and/or add data to the database.

How would these events be triggered? Generally they run once a week, but there will eventually be dozens to hundreds of these operations per week.
I’ve glanced at Gert’s document describing a custom logger, but haven’t dug into it yet. How would the service methods in the application be called to run the necessary queries?

Any tips or references would be much appreciated.

pfreitag · December 16, 2018, 11:19pm

Another approach to consider, now that you can run CFML on AWS Lambda (https://fuseless.org/) would be to invoke a lambda function to handle the processing (as long as it can be done within 15 minutes, the current lambda timeout). You can use IAM to make sure that only your application has permission to invoke the lambda.

dnando · December 17, 2018, 8:01pm

Pete, Thanks for the suggestion. The facility to run Lucee on AWS Lambda looks very interesting. Would / could I invoke the Lambda with a scheduled task from the main application?

pfreitag · December 17, 2018, 8:20pm

Yes, there are lots of ways you can invoke a lambda from a CFML application.

You could invoke it directly, here is an example of that: Invoke Lambda from ColdFusion | Matthew J. Clemente

Or you could use something like SQS (simple queue service) and publish a message, then have the queue configured to automatically invoke the lambda when it gets new entries.

You could use the API gateway and setup a HTTP api and use cfhttp to invoke it.

If you are using Aurora or DynamoDB you could even create a trigger to execute the lambda function whenever data is written to a table.

Finally you could avoid the scheduled task all together and use SNS to invoke the lambda on a scheduled interval.

I did not even cover all the types of events that can be used to trigger a lambda, so yes there are lots of ways it could work.

ddspringle · December 17, 2018, 11:37pm

Sorry for the delay in responding. I second Pete’s idea so long as the timeout meets your needs. If it doesn’t, or if you don’t want to use AWS lambda for whatever reason, then the simplest solution is to use a directory watcher event gateway.

Directory watcher triggers whenever you add (or modify or delete, if you so choose) a file from a watched directory of files.

For instance, you could write a JSON file into a watched directory that contains the data needed to perform your data munging and then have your gateway code’s onAdd() function read the JSON file and then process whatever data it needs to from it to execute your query.

Triggering the JSON file being written could happen with a scheduled task, which is probably the most common approach. You could also trigger it based on any system event, a webhook, API call, etc. The request in this case simply writes the JSON. Quick and clean. The gateway then does the heavy lifting.

For very large data munging tasks you can break it down into multiple (hundreds, thousands) of JSON files and let the gateway process each one. In extreme cases I’ve had a gateway that takes a single JSON file and break it apart into X additional JSON files being watched in another directory by another gateway, which then does the heavy lifting. That seems to scale best for those very large data munging tasks (like the OP mentioned).

With a little use of caching you can do one long read cycle, store it in cache, and then have the gateway work against the cached data.

The drawback to gateways is they are not part of your application, and thus will not have access to any resources your application has access to unless explicitly defined in the gateway. For example, service/dao would have to be added to the gateway (e.g. new model.service.myService()), the datasource would have to be defined, etc.

HTH

– Denny

ddspringle · December 18, 2018, 12:05am

Just had a look… Pretty slick work on fuseless there bud. Thanks for sharing it!