URLDecoder timeout for very large POST requests

@bennadel This is a topic that we imported from our Google Group so I’m not sure if it actually will notify any of the original people from the thread as I think it auto-created their accounts here.

Regarding the error, do you have any idea what’s in the form scope, what content type it is, or have a standalone page that can be submitted to reproduce the issue? I’m assuming there’s some massive amount of form data that’s taking a very long time to URL decode. Does the FusionReactor profiler show any other parts of Lucee that are spending a lot of time running?

So, the Content Length of the Fusion Reactor profiler says NaN – I’m not sure if that means the header is corrupted – or, if it just hasn’t gotten processed yet. It’s crazy. We just installed FR today, so I am still getting situated.

I have to run – will report more tomorrow.

It’s possible that the content length isn’t available until it’s processed, but that said-- I was pretty sure most clients sent the content length in the incoming HTTP headers. You may be able to capture more from your browser debugging tools (assuming you can reproduce this on demand) or with a packet sniffer.

@bennadel I have just looked back at some email conversations from 5 years ago. It does not look like this was ever resolved, but in our case the number of users sending large attachments was very low and failures of this nature were handled manually.

The app in question changed to use Mailgun in 2016 which has a different setup around these inbound email webhooks. As such, we have never seen this error again.

Not much help I am afraid!!

It looks like this client is not sending the Content-Length header. We’re going to try and add that today (we own the Client software as well). Hopefully that will shed some light.

I’m seeing something very curious in the Fusion Reactor stack-trace / Profiler. When spawning a new Thread (cfthread), it looks as if Lucee is duplicating the Form scope for the thread (see screenshot)

Am I crazy to think that it’s actually performing the URL decoding again for the sake of the spawned thread? If so, we’d be incurring the cost of a large HTTP post twice.

hmmm, could be related to this bug which @bdw429s found with arrayEach https://luceeserver.atlassian.net/browse/LDEV-2559

Oh, very interesting, re: Brad’s ticket.

This morning, I did some experimentation, and I can demonstrate – I think – that as the size of the FORM post increases, the cost / time / overhead of spawning a CFThread increases dramatically. Even in a local CommandBox instance, posting a 1.5MB payload causes CFThread to take 30-seconds to spawn:

I am wondering if there is a way that I can clear the request param map prior to spawning the thread? I didn’t see anything like that in getPageContext() output.

2 Likes

Great investigative work @bennadel, that 30s is a bit crazy, I’m sure @micstriit will be keen to investigate

I had noticed that threads seemed to have so much overhead they were barely useful when I was working on speeding up the Lucee docs processing…

I’m digging through the Lucee source code to see if there is a way I can clear / reset the Form data before spawning the thread (in hopes that the clone operation doesn’t have to re-parse it). But, so far, no such luck. I do see that I can access the underlying servlet request; but, nothing in the way for .getFormScope().clear() seems to make a difference.

It looks like cloning the page-contex ultimately calls this: https://github.com/lucee/Lucee/blob/dbb220d72027fea68f927654388456515289867a/core/src/main/java/lucee/runtime/net/http/HTTPServletRequestWrap.java#L277

return ScopeUtil.getParameterMap(new URLItem[][] { form.getRaw(), url.getRaw() }, new String[] { form.getEncoding(), url.getEncoding() });

… and I don’t see any way to reset the underlying .raw data in the FormImpl.

You could probably cheat your way around it…

Dump the input out to a file, urldecoded or otherwise
Save the temp file’s path in a string
cfhttp call localhost/yourtemplatelocation and pass the temp file’s path

Let that template process the content in a cfthread, and use the filesystem to pull the payload… Then the clone is only cloning the form scope which only contains the filename, not the content.

Not ideal, but if it’s more efficient, so be it. :smiley: You could maybe even solve it at the J2EE level with a servletfilter or jsp that does that work and manipulates the HTTPServletRequest before it is passed to Lucee.

3 Likes

Oh, that’s a super interesting idea! In my case, its just a giant JSON payload. I wonder if I could get the client to attach it as a File instead of just a form field.

OH CHICKENS! Sir, you may just be the best thing that happened to me this week :smiley:

I just did a quick test locally where I posted the JSON payload as File rather than a Form Field, then I just do a fileRead( form.textFile ) and do my Thread spawning.

It runs instantly! Threads spawn instantly! This could really be a viable step!

Neat, I helped Ben Nadel. Now I just need to get a pic with you sometime so I can make it into your slideshow :smiley:

Heck yeah! That would be great :smiley:

Talking to my other devs to see how feasible it is to start experimenting with this file-based approach. I’m so excited, I feel nauseous :stuck_out_tongue_winking_eye:

wait, so JSON values in form fields are getting automatically parsed by Lucee, but only in new pageContexts for threads?

I don’t think they’re being parsed like “JSON”, I think they’re just being decoded (like form-field encoding).

@bennadel I see some discussion above related to spawning threads, but I thought your original question was simply saying the initial request to Lucee was slow. Is it both?

I have defo complained about the overhead of cloning the full servlet request object and how it gets slower the more stuff is in there. MIcha has been resistant to consider any sort of changes to that process, but I do think it’s a real issue. Can you please enter a ticket for this Ben? Should be easy to whip up a standalone example of submitting a huge JSON form field.

And a final thought-- you mentioned this was faster when you sent a file-- I assume you meant attaching it as a separate MIME content section? What happens if you send the raw JSON in the request body with a JSON content type so it’s not URL encoded form fields?

Would be easy enough to tweak your test case to lead with invalid json characters.