What is the best way to do high number of webservice calls?

Hi,

I am rewriting an old application, which checks if a person or a company
from the list is in the registry of debtors via web service and get some
more info from there.

( In the past I had to use a webservice which required incrementally
duplicating the whole national database of debtors on my server. It was
quite troublesome, as I needed to download completely unrelated data and
constantly react to db structure changes.
Now I can finally use a new webservice which checks each entity separately
and provides the only related data )

But I have more than 4000 entities (so far) which I need to check every
day. So I am wondering what would be the best way to acomplish it ?

Each check includes:

  • cfhttp call to the webservice
  • parsing quite complex, namespaced, soap response
  • and saving the data in multiple tables

I would definitely like to do it via scheduled task at night.

  1. I could simply make all the cfhttp calls in one cfloop. But I don’t
    like the idea of one giant request, which will take ages to finish. It will
    hog resources, timeout or cause some other trouble.

  2. I thought a simple cfm page like this, run as a scheduled taks, would
    break it down into more reasonable chunks:

param name="url.personid" default="0"; select top 1 personid from persons where personid >

But this actually also generates one request, taking > 50 seconds to
finish without any actual processing. So I am not sure, if it is any
better than the “giant loop” approach.

  1. I could come up with some javascript / ajax based solution. But I would
    like to do it on the server, so I don’t need any client running.

Would anyone have some idea or experience with something similar ?

Regards

Ivan

Make a queue.

So in one db table make a list of the id’s you are going to process and if they are processed, and then make the scheduled tasks to loop through them, and once done, you tick that off. so each time you process something one item gets taken out

you can then scale this as you can in effect soft-lock a record by putting a status of “processing”

So your queue table might look like e

id personID processingStartedDateTime processingEndedDateTime

So you can go and find the first personID that has a null processingStartedDate, enter the date, into the queue row (you have started processing) do your processing and then put a processingEndedDate

At any point you can query this table to see what is being processed, you can do multiple calls and start various requests processing it (they would each take a different person ID)

Of course, you should lock those updates to the queue table in a transaction

Hope that makes sense?

MD> On 9 Mar 2016, at 16:01, Ivan Rotrekl <@Ivan_Rotrekl> wrote:

Hi,

I am rewriting an old application, which checks if a person or a company from the list is in the registry of debtors via web service and get some more info from there.

( In the past I had to use a webservice which required incrementally duplicating the whole national database of debtors on my server. It was quite troublesome, as I needed to download completely unrelated data and constantly react to db structure changes.
Now I can finally use a new webservice which checks each entity separately and provides the only related data )

But I have more than 4000 entities (so far) which I need to check every day. So I am wondering what would be the best way to acomplish it ?

Each check includes:

  • cfhttp call to the webservice
  • parsing quite complex, namespaced, soap response
  • and saving the data in multiple tables

I would definitely like to do it via scheduled task at night.

  1. I could simply make all the cfhttp calls in one cfloop. But I don’t like the idea of one giant request, which will take ages to finish. It will hog resources, timeout or cause some other trouble.

  2. I thought a simple cfm page like this, run as a scheduled taks, would break it down into more reasonable chunks:

param name="url.personid" default="0"; select top 1 personid from persons where personid >

But this actually also generates one request, taking > 50 seconds to finish without any actual processing. So I am not sure, if it is any better than the “giant loop” approach.

  1. I could come up with some javascript / ajax based solution. But I would like to do it on the server, so I don’t need any client running.

Would anyone have some idea or experience with something similar ?

Regards

Ivan


Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com mailto:lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/ebb96d57-08c9-4dda-8293-799180f29536%40googlegroups.com https://groups.google.com/d/msgid/lucee/ebb96d57-08c9-4dda-8293-799180f29536%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

Thanks a lot for such a quick repply !

If I understand this correctly, these are great suggestions how to monitor
and track the progress of the processing.

But I am actually more confused about how to * execute* the processing for
several thousands rows ( ideally via scheduled task ).

Lets say, I have a controller function which processes one person: *process(personid)
*and the corresponding FW/1 url

*/?action=person.process&personid={personid} *1) I could write another
controller function processAll() which would simply loop over query of 4000
rows and invoke process(personid). I could then call it via scheduled
task, increase the request timeout & hope for the best. But that seems a
bit blunt.

  1. Or I could invoke via scheduled task a page or function which would call
    itself via cfhttp repeatedly (as I wrote in my original post). But I am not
    sure if that would be any better than the loop.

I could imagine a client side code which would retrieve the next personid
to be processed from the server, invoke the processing, wait for it to
finish and then repeat untill all done.

But I am not sure how to do it all server-side, ideally one by one. ( I
would rather make 4000 relatively short-lived requests than one giant one )

Regards

Ivan

What I would do is set a “optimal” batch (you can vary this)

The scheduled task would call a page that processes 400 (for example). this runs every 30 seconds or minute or whatever. (a short period of time)

You can then make it self healing, so part of your query is “ok, get me the next 400 that haven’t been processed”

Part of the query of the queue can see how many are processing (start time but no end time yet) and if that is pretty high, you just exit.

Your script can have individual processing for each person etc. but the point here is that your system tracks and “self heals”

You could also add onError code in there and what not but that is another story.

So , to recap. You have a PersonQueue table, with :

id personID processingStartedDateTime processingEndedDateTime failureCount

You also have a scheduled task that runs very frequently. Your script will handle the throttling by aborting early if conditions are not met

  1. It does a query to find out how many items are being processed (WHERE processingStartedDateTime IS NOT NULL AND processingEndedDateTime IS NULL)
  2. If there are loads being processed (say 400?) Quit. you are done. Other requests are doing the work.
  3. If there are a lot of OLD processingStartedDateTime WHERE processingEndedDateTime is NULL AND processingStartedDateTime > RequestTimeout… then we better fix these.
  4. Select them, mark them as failed once (update PersonQueue SET failureCount ++) And clear the processingStartedDateTime (this request can now abort)
  5. the next request comes in (from the scheduled task) and sees that we have met the parameters (there aren’t too many people being processed and there aren’t a bunch that need to be cleared up) we can then:
  6. Get 400 people to process, set their processingStartedDateTime and get to it!
  7. When you have processed one person set their processingEndedDateTime

as a quick note, I would not do:

UPDATE PersonQueue SET processingStartedDateTime = #now()#

(this was a tip from Cameron Childress actually)

But rather I would set up at the top of your processing something like request.nowtime = Now() and then do:

UPDATE PersonQueue SET processingStartedDateTime = #request.nowtime#

This makes it easier to then query the items as they would all have the same start time and end time.

Does this process make more sense?

MD> On 9 Mar 2016, at 17:32, Ivan Rotrekl <@Ivan_Rotrekl> wrote:

Thanks a lot for such a quick repply !

If I understand this correctly, these are great suggestions how to monitor and track the progress of the processing.

But I am actually more confused about how to execute the processing for several thousands rows ( ideally via scheduled task ).

Lets say, I have a controller function which processes one person: process(personid) and the corresponding FW/1 url /?action=person.process&personid={personid}

  1. I could write another controller function processAll() which would simply loop over query of 4000 rows and invoke process(personid). I could then call it via scheduled task, increase the request timeout & hope for the best. But that seems a bit blunt.

  2. Or I could invoke via scheduled task a page or function which would call itself via cfhttp repeatedly (as I wrote in my original post). But I am not sure if that would be any better than the loop.

I could imagine a client side code which would retrieve the next personid to be processed from the server, invoke the processing, wait for it to finish and then repeat untill all done.

But I am not sure how to do it all server-side, ideally one by one. ( I would rather make 4000 relatively short-lived requests than one giant one )

Regards

Ivan


Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com mailto:lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/cc3795e9-3a3f-4755-90b0-3095f3d907b7%40googlegroups.com https://groups.google.com/d/msgid/lucee/cc3795e9-3a3f-4755-90b0-3095f3d907b7%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

Thank you very much for this detailed explanation. Yes, it truly does make
much more sense now. I really appreciate the time and effort you’ve spent
on this.

Best Regards

Ivan