XML parsing and editing of big (not huge) files

Hello Lucee community.

I’m building a web tool that

  • parses a XML document (always different)
  • retrieve some nodes
  • edits some attributes
  • save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent
message. Since it stalls the whole lucee engine i suspect is an out of
memory error due to XMLParse loading all the document in memory.

I’ve seen several approaches using java inputstream or cfml readline(), but
these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They’re big but shouldn’t be that
huge…

Is there any library or method I could use to do that?

Thank You in advance for your answers.–
Enrico

Hi Enrico,

we use Saxon EE for XSLT and schema validation, XML changes are done via
XSLT. There is a free versiont too, but it doesn’t support streaming:

We still use some CF xml functions. After we switched to Lucee (with its
default memory settings) we had the same server behavior as you describe it
(caused by long running XML imports). The server now has 8 GB heap and runs
stable.

WalterAm Dienstag, 31. März 2015 16:23:09 UTC+2 schrieb Enrico Rosso:

Hello Lucee community.

I’m building a web tool that

  • parses a XML document (always different)
  • retrieve some nodes
  • edits some attributes
  • save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent
message. Since it stalls the whole lucee engine i suspect is an out of
memory error due to XMLParse loading all the document in memory.

I’ve seen several approaches using java inputstream or cfml readline(),
but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They’re big but shouldn’t be that
huge…

Is there any library or method I could use to do that?

Thank You in advance for your answers.


Enrico

I agree definitely large xml files are not a problem but check your memory
settings of what you are giving to Lucee.

ASent from my phone
On 31 Mar 2015 16:45, “Walter Seethaler” <@Walter_Seethaler> wrote:

Hi Enrico,

we use Saxon EE for XSLT and schema validation, XML changes are done via
XSLT. There is a free versiont too, but it doesn’t support streaming:

Saxon XSLT - Wikipedia
Saxonica: Welcome

We still use some CF xml functions. After we switched to Lucee (with its
default memory settings) we had the same server behavior as you describe it
(caused by long running XML imports). The server now has 8 GB heap and runs
stable.

Walter

Am Dienstag, 31. März 2015 16:23:09 UTC+2 schrieb Enrico Rosso:

Hello Lucee community.

I’m building a web tool that

  • parses a XML document (always different)
  • retrieve some nodes
  • edits some attributes
  • save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent
message. Since it stalls the whole lucee engine i suspect is an out of
memory error due to XMLParse loading all the document in memory.

I’ve seen several approaches using java inputstream or cfml readline(),
but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They’re big but shouldn’t be
that huge…

Is there any library or method I could use to do that?

Thank You in advance for your answers.


Enrico


You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/16aa33c6-cf38-4a57-8f81-384cdac35116%40googlegroups.com
https://groups.google.com/d/msgid/lucee/16aa33c6-cf38-4a57-8f81-384cdac35116%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Hi Enrico,

Yes, that is where you set the memory settings. It is impossible to give
“recommended” settings as it completely depends on your application. Set
them to some value and then test your application, monitor for performance,
memory usage, etc… and see how it is performing.

To measure performance you can use many different applications but I would
recommend either Fusion Reactor (http://www.fusion-reactor.com/) or New
Relic (http://newrelic.com/). Fusion Reactor is more CFML specific and is a
bit easier to install than New Relic. New Relic is more “industry standard”
for Java APM but requires a bit more work. I put together a helper CFC for
New Relic recently which you can find here
GitHub - mso-net/lucee-newrelic: Lucee New Relic Component that gives you more CFML
information than the out-of-the-box installation. Both have 14 day free
trials and New Relic has a “free” tier than you can continue to use after
14 days but with a limited information set and only 24 hours of data
retention.

Kind regards,

Andrew
about.me http://about.me/andrew_dixon
mso http://www.mso.net - Lucee http://lucee.org - MemberOn 31 March 2015 at 23:42, Enrico <@Enrico> wrote:

Thank you guys for your answers. Il check for memory settings. I guess,
being on tomcat, I should use setenv.sh

Are there any suggested settings for Lucee?


Enrico

(Inviato da iPhone)

Il giorno 31/mar/2015, alle ore 23:29, Alex Skinner <@Alex_Skinner> ha
scritto:

I agree definitely large xml files are not a problem but check your memory
settings of what you are giving to Lucee.

A

Sent from my phone
On 31 Mar 2015 16:45, “Walter Seethaler” <@Walter_Seethaler> wrote:

Hi Enrico,

we use Saxon EE for XSLT and schema validation, XML changes are done via
XSLT. There is a free versiont too, but it doesn’t support streaming:

Saxon XSLT - Wikipedia
Saxonica: Welcome

We still use some CF xml functions. After we switched to Lucee (with its
default memory settings) we had the same server behavior as you describe it
(caused by long running XML imports). The server now has 8 GB heap and runs
stable.

Walter

Am Dienstag, 31. März 2015 16:23:09 UTC+2 schrieb Enrico Rosso:

Hello Lucee community.

I’m building a web tool that

  • parses a XML document (always different)
  • retrieve some nodes
  • edits some attributes
  • save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent
message. Since it stalls the whole lucee engine i suspect is an out of
memory error due to XMLParse loading all the document in memory.

I’ve seen several approaches using java inputstream or cfml readline(),
but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They’re big but shouldn’t be
that huge…

Is there any library or method I could use to do that?

Thank You in advance for your answers.


Enrico


You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/16aa33c6-cf38-4a57-8f81-384cdac35116%40googlegroups.com
https://groups.google.com/d/msgid/lucee/16aa33c6-cf38-4a57-8f81-384cdac35116%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to a topic in the
Google Groups “Lucee” group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/lucee/4QzaECkyBEs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/CAFrbJ5VaTMcJiM00RLcV8eDiJ%3DbAXYR1d1oe0OMMMABywBMDbA%40mail.gmail.com
https://groups.google.com/d/msgid/lucee/CAFrbJ5VaTMcJiM00RLcV8eDiJ%3DbAXYR1d1oe0OMMMABywBMDbA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/C2988124-2655-440D-953F-204C3D56DCD0%40gmail.com
https://groups.google.com/d/msgid/lucee/C2988124-2655-440D-953F-204C3D56DCD0%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Ii have added a new entry to the wiki that shows how you can use the event
driven XML parser (SAX) with Lucee.
https://bitbucket.org/lucee/lucee/wiki/Cookbook_SAX

This parser does not store any data on it’s own in memory, it is completely
up to you to store data. So you can read xml files in any size.

MichaOn Tue, Mar 31, 2015 at 4:23 PM, Enrico Rosso <@Enrico> wrote:

Hello Lucee community.

I’m building a web tool that

  • parses a XML document (always different)
  • retrieve some nodes
  • edits some attributes
  • save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent
message. Since it stalls the whole lucee engine i suspect is an out of
memory error due to XMLParse loading all the document in memory.

I’ve seen several approaches using java inputstream or cfml readline(),
but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They’re big but shouldn’t be that
huge…

Is there any library or method I could use to do that?

Thank You in advance for your answers.


Enrico


You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/dbd993b1-ad2a-4e7a-9a61-5340f6524ef2%40googlegroups.com
https://groups.google.com/d/msgid/lucee/dbd993b1-ad2a-4e7a-9a61-5340f6524ef2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

1 Like

cool stuff! :slight_smile:

Igal Sapir
Lucee Core Developer
Lucee.org http://lucee.org/On 4/2/2015 8:58 AM, Michael Offner wrote:

Ii have added a new entry to the wiki that shows how you can use the
event driven XML parser (SAX) with Lucee.
lucee / Lucee / wiki / Cookbook SAX — Bitbucket

This parser does not store any data on it’s own in memory, it is
completely up to you to store data. So you can read xml files in any size.

Micha

On Tue, Mar 31, 2015 at 4:23 PM, Enrico Rosso <@Enrico mailto:Enrico> wrote:

Hello Lucee community.

I'm building a web tool that

- parses a XML document (always different)
- retrieve some nodes
- edits some attributes
- save it back.

on big documents (>3-4MB) the procedure just hangs with no
apparent message. Since it stalls the whole lucee engine i suspect
is an out of memory error due to XMLParse loading all the document
in memory.

I've seen several approaches using java inputstream or cfml
readline(), but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They're big but shouldn't
be that huge....

Is there any library or method I could use to do that?

Thank You in advance for your answers.

-- 
Enrico
-- 
You received this message because you are subscribed to the Google
Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to lucee+unsubscribe@googlegroups.com
<mailto:lucee+unsubscribe@googlegroups.com>.
To post to this group, send email to lucee@googlegroups.com
<mailto:lucee@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/dbd993b1-ad2a-4e7a-9a61-5340f6524ef2%40googlegroups.com
<https://groups.google.com/d/msgid/lucee/dbd993b1-ad2a-4e7a-9a61-5340f6524ef2%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/CAG%2BEEBxNMOZT9xmdmkmqjD0B5T3LJ7%3D7ybEF2roG9uQaU%2Biu%2BA%40mail.gmail.com
https://groups.google.com/d/msgid/lucee/CAG%2BEEBxNMOZT9xmdmkmqjD0B5T3LJ7%3D7ybEF2roG9uQaU%2Biu%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.