toString not generating valid XML

There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

Images are missing…Il giorno sabato 15 agosto 2015 01:10:34 UTC+2, Marilou Landes ha scritto:

There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).

You didn’t include code to demonstrate how you’re processing the HTML
file, which is pretty key to this.

However this will do it:

dir = expandPath(“./”);
platform = structKeyExists(server, “lucee”) ? “lucee” : “coldFusion”;
xml = xmlParse(“#dir#before.html”);
fileWrite(“#dir##platform#.xml”, toString(xml));

Running that on CF11 and Lucee 4.5, I am seeing the same thing you are.

Suggestions for resolving this?

  1. File a bug.
  2. Await someone to fix it, or
  3. Fix it yourself.

Lucee’s XML parsing just doesn’t seem to work in this situation.On Saturday, 15 August 2015 00:10:34 UTC+1, Marilou Landes wrote:


Adam

Lucee’s XML parsing just doesn’t seem to work in this situation.

There is also a second bug with this:
https://luceeserver.atlassian.net/browse/LDEV-492

“Two for the price of one” day today, apparently.On Saturday, 15 August 2015 15:36:16 UTC+1, Adam Cameron wrote:

Instead of doing xmlParse lucee should have an htmlParse function (can’t check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.> On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes> wrote:

There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I’ve attached a simple HTML document that is read using xmlParse( ). I’ve also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Before.html>
<AfterACF10.html>
<AfterLucee.html>

Images are missing…

The images aren’t really relevant. It’s the contents of the files the OP is
meaning for us to look at.On Saturday, 15 August 2015 01:16:32 UTC+1, Francesco Pepe wrote:

Aha
I got you,

But they did ask for a work around no?

Anyway, I didn’t miss the overall point. there seems to be a bug you are
right, but I gave an option on how to get round it.

The source is HTML in Before.html might have closing tags but it is
NOT marked as proper XHTML (it’s marked as HTML5) so it gets processed
by (I am guessing here TagSoup? ) as a non XML variant right?

Not marked as

And if you DO mark the BEFORE.html as proper XHTML it then XMLparses
correctly as:

<?xml version="1.0" encoding="UTF-8"?><html

xmlns=“XHTML namespace”> XMLParse and toString
Test

XMLParse and toString Test

someFile

So in the first case you are saying “hey lucee, xmlparse this HTML5 doc,
which doesnt need stuff to be closed” it then parses it following the
rules of HTML5

If you properly mark it up as XHTML then it DOES parse it with those
rules. Basically lucee is doing what you asked it to.

Regards

Mark Drew> Adam Cameron mailto:Adam_Cameron

16 August 2015 10:25
You are somewhat missing the point, Mark. The source document
/is/ XML. The dialect of mark-up the file contains might be XHTML, but
it’s well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a
Lucee XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not
lie with the initial parsing.

On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Mark Drew mailto:Mark_Drew
16 August 2015 10:04
I did the following example, using your test files and I got the
output that you might expect

xmlHTML = xmlParse(FileRead(“Before.html”));
htmlHtml = htmlParse(FileRead(“Before.html”));

dump(toString(xmlHTML));
dump(toString(htmlHtml));

The output of xmlParse is:

XMLParse and toString Test

XMLParse and toString Test

someFile

Maintaining the actual format of HTML even though you used the
xmlParse, in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html

xmlns=“XHTML namespace” lang=“en”>XMLParse and toString
Test

XMLParse and toString Test

someFile

Which I think is more akin to what you were expecting (closing the br
and img tags)

Hope that helps

Regards

Mark Drew

Marilou Landes mailto:Marilou_Landes
16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like
need to be closed properly, as in

.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Mark Drew mailto:Mark_Drew
15 August 2015 17:14
Instead of doing xmlParse lucee should have an htmlParse function
(can’t check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:

There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
<Before.html>
<AfterACF10.html>
<AfterLucee.html>
Marilou Landes mailto:Marilou_Landes
15 August 2015 00:10
There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached a
simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10 and
Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

Yep. Interestingly, the same behaviour Lucee is exhibiting.On 16 August 2015 at 22:31, Adam Cameron <@Adam_Cameron> wrote:

I can only presume your browser is showing you a back-working of its
parsed DOM document, not the original file contents.

so you are using XMLParse to turn a non-XML document into an XML
document? (the to String is just serializing it)

Marilou Landes wrote:>

I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like
need to be closed properly, as in

.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

Instead of doing xmlParse lucee should have an htmlParse function
(can’t check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com <javascript:>> wrote:

There is a difference between the ACF10 toString( value ) output
and Lucee toString( value ) output when the value is XML. I’ve
attached a simple HTML document that is read using xmlParse( ).
I’ve also attached 2 additional files, that reprersent the output
from ACF10 and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and
has not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany -
Get your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the
Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to lucee+un...@googlegroups.com <javascript:>.
To post to this group, send email to lu...@googlegroups.com
<javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout
https://groups.google.com/d/optout.
<Before.html>
<AfterACF10.html>
<AfterLucee.html>


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com

https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

You are somewhat missing the point, Mark. The source document is XML. The
dialect of mark-up the file contains might be XHTML, but it’s well-formed
and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a Lucee
XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not lie
with the initial parsing.On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:

I did the following example, using your test files and I got the output
that you might expect

xmlHTML = xmlParse(FileRead(“Before.html”));
htmlHtml = htmlParse(FileRead(“Before.html”));

dump(toString(xmlHTML));
dump(toString(htmlHtml));

The output of xmlParse is:

XMLParse and toString Test

XMLParse and toString Test

someFile

Maintaining the actual format of HTML even though you used the xmlParse,
in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html xmlns=

XHTML namespacehttp://www.w3.org/1999/xhtml
lang=“en”>XMLParse and toString Test

XMLParse and toString Test


someFile

Which I think is more akin to what you were expecting (closing the br and
img tags)

Hope that helps

Regards

Mark Drew

That’s the source of before.html according to my machine. Are you seeing
something different?On 16 Aug 2015 20:39, “Adam Cameron” <@Adam_Cameron> wrote:

On Sunday, 16 August 2015 10:51:31 UTC+1, James Holmes wrote:

I looked at it. I see this:

<META http-equiv="Content-Type"

content=“text/html; charset=utf-8”>

XMLParse and
toString Test


That is not valid XML.

No, that isn’t. But… where did you get that from? It’s not the doc
under discussion here.


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your
ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/d5e97f1d-ce69-4b02-b4ad-00fbe314b402%40googlegroups.com
https://groups.google.com/d/msgid/lucee/d5e97f1d-ce69-4b02-b4ad-00fbe314b402%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Could be. I think it might be the sax parser but that is as far as I got looking into the XMLUtil and parseXml funcs.

Mark Drew

  • Sent by typing with my thumbs.> On 16 Aug 2015, at 11:47, Adam Cameron <@Adam_Cameron> wrote:

On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:

The source is HTML in Before.html might have closing tags but it is NOT marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am guessing here TagSoup? ) as a non XML variant right?

Whatever it’s doing, it’s predicated on the outer tags being . If I just change those to be - even if I leave the - then it behaves as one would expect it to.

So it does seem like it’s performing some unsolicited guess work here.


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/442fbc9e-cf2b-4f3f-b997-c77395318ec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I looked at it. I see this:

<META http-equiv="Content-Type"

content=“text/html; charset=utf-8”>

XMLParse and
toString Test


That is not valid XML.–
Shu Ha Ri: Agile/Lean Product Development blog - http://www.bifrost.com.au/
Agile in 140 characters or less - https://twitter.com/James_R_Holmes
Whatever LinkedIn is for - http://www.linkedin.com/in/jrholmes

On 16 August 2015 at 19:25, Adam Cameron <@Adam_Cameron> wrote:

You are somewhat missing the point, Mark. The source document is XML.
The dialect of mark-up the file contains might be XHTML, but it’s
well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a Lucee
XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not lie
with the initial parsing.

On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:

I did the following example, using your test files and I got the output
that you might expect

xmlHTML = xmlParse(FileRead(“Before.html”));
htmlHtml = htmlParse(FileRead(“Before.html”));

dump(toString(xmlHTML));
dump(toString(htmlHtml));

The output of xmlParse is:

XMLParse and toString Test

XMLParse and toString Test

someFile

Maintaining the actual format of HTML even though you used the xmlParse,
in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html xmlns=

XHTML namespacehttp://www.w3.org/1999/xhtml
lang=“en”>XMLParse and toString Test

XMLParse and toString Test


someFile

Which I think is more akin to what you were expecting (closing the br and
img tags)

Hope that helps

Regards

Mark Drew


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your
ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

That’s the source of before.html according to my machine. Are you seeing
something different?

Yup. The file contains this:

XMLParse and toString Test

XMLParse and toString Test

someFile

That’s from the actual file, not from browsing to it and looking at the
source. I can only presume your browser is showing you a back-working of
its parsed DOM document, not the original file contents.On Sunday, 16 August 2015 13:01:30 UTC+1, James Holmes wrote:

The source is HTML in Before.html might have closing tags but it is NOT
marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am
guessing here TagSoup? ) as a non XML variant right?

Whatever it’s doing, it’s predicated on the outer tags being . If I
just change those to be - even if I leave the - then
it behaves as one would expect it to.

So it does seem like it’s performing some unsolicited guess work here.On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:

I looked at it. I see this:

<META http-equiv="Content-Type"

content=“text/html; charset=utf-8”>

XMLParse and
toString Test


That is not valid XML.

No, that isn’t. But… where did you get that from? It’s not the doc
under discussion here.On Sunday, 16 August 2015 10:51:31 UTC+1, James Holmes wrote:

I did the following example, using your test files and I got the output
that you might expect

xmlHTML = xmlParse(FileRead(“Before.html”));
htmlHtml = htmlParse(FileRead(“Before.html”));

 dump(toString(xmlHTML));
 dump(toString(htmlHtml));

The output of xmlParse is:

XMLParse and toString Test

XMLParse and toString Test

someFile

Maintaining the actual format of HTML even though you used the xmlParse,
in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html

xmlns=“XHTML namespace” lang=“en”>XMLParse and toString
Test

XMLParse and toString Test

someFile

Which I think is more akin to what you were expecting (closing the br
and img tags)

Hope that helps

Regards

Mark Drew> Marilou Landes mailto:Marilou_Landes

16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like
need to be closed properly, as in

.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Mark Drew mailto:Mark_Drew
15 August 2015 17:14
Instead of doing xmlParse lucee should have an htmlParse function
(can’t check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:

There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
<Before.html>
<AfterACF10.html>
<AfterLucee.html>
Marilou Landes mailto:Marilou_Landes
15 August 2015 00:10
There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached a
simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10 and
Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

The source is HTML in Before.html might have closing tags but it is NOT
marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am
guessing here TagSoup? ) as a non XML variant right?

Sorry, you’re fixating on a tangential explantatory comment I made, which
overall is not (well ought not) be relevant. I was using the erm “XHTML”
merely as an explanation that HTML can indeed also be XML. XHTML has more
rules beyond just that, as you point out.

xmlParse() should not be guessing at dialects, it should be doing what it’s
told: here’s some XML… parse it.

I slightly misspoke before though. I should have restricted my description
of the doc’s appropriateness to the notion of “well-formed”, not “valid”.
On reading the RFC, these are two different things (I did not realise there
was this distinction, nor that those terms are meaningful in the context of
XML). XML is “well-formed” if its tags balance out etc. But to be “valid”
it needs to have a doctype and a DTD (although I suspect these days a
schema would also be OK in lieu of at DTD… I’m only reading this thing
superficially).

Now… xmlParse() could have been implemented to reject any non-VALID XML,
however I think we can all agree that’s unnecessarily restrictive. Who here
has really spent much time ensuring their XML has a DTD, and the XML
follows it? I actually have in the past, but not for a very long time as it
all seems like a lot of work for little real-world gain. Similarly with
schemas. Anyhow, Adobe took the pragmatic route and decided “well-formed”
was the requirement for an XML string to be parseable. This is the cue
Railo and then Lucee should (and as far as I can tell: did) take.

If an XML string has DTD information, then the XML must need to comply with
the DTD. Same with a schema.

However the XML parser should not make unsolicited guesses as to what the
dialect of the XML is. If no such information is provided, then no such
information should be inferred. And if Lucee is doing that (as you
speculate? it might be), it is wrong to do so.On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:

It’s not XML

validity error: Validation failed: no DTD found !> James Holmes mailto:James_Holmes

16 August 2015 10:51
I looked at it. I see this:

<META http-equiv="Content-Type"

content="text/html;

XMLParse and toString Test


That is not valid XML.


Shu Ha Ri: Agile/Lean Product Development blog -
http://www.bifrost.com.au/
Agile in 140 characters or less - https://twitter.com/James_R_Holmes
Whatever LinkedIn is for - http://www.linkedin.com/in/jrholmes


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/CAEhA4MrPGgETe0GhvuDuCZejse6T9J8bkN%3DQktw1R_0uDTp42w%40mail.gmail.com
https://groups.google.com/d/msgid/lucee/CAEhA4MrPGgETe0GhvuDuCZejse6T9J8bkN%3DQktw1R_0uDTp42w%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Adam Cameron mailto:Adam_Cameron
16 August 2015 10:25
You are somewhat missing the point, Mark. The source document
/is/ XML. The dialect of mark-up the file contains might be XHTML, but
it’s well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a
Lucee XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not
lie with the initial parsing.

On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com
https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Mark Drew mailto:Mark_Drew
16 August 2015 10:04
I did the following example, using your test files and I got the
output that you might expect

xmlHTML = xmlParse(FileRead(“Before.html”));
htmlHtml = htmlParse(FileRead(“Before.html”));

dump(toString(xmlHTML));
dump(toString(htmlHtml));

The output of xmlParse is:

XMLParse and toString Test

XMLParse and toString Test

someFile

Maintaining the actual format of HTML even though you used the
xmlParse, in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html

xmlns=“XHTML namespace” lang=“en”>XMLParse and toString
Test

XMLParse and toString Test

someFile

Which I think is more akin to what you were expecting (closing the br
and img tags)

Hope that helps

Regards

Mark Drew

Marilou Landes mailto:Marilou_Landes
16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like
need to be closed properly, as in

.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com
https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
Mark Drew mailto:Mark_Drew
15 August 2015 17:14
Instead of doing xmlParse lucee should have an htmlParse function
(can’t check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:

There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google
Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to lucee+unsubscribe@googlegroups.com
mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com
mailto:lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
<Before.html>
<AfterACF10.html>
<AfterLucee.html>

I will look at the htmlParse function, but the issue is really with the
toString(), whereby the xml document object is rewritten as a string. Null
elements, like
need to be closed properly, as in
.On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

Instead of doing xmlParse lucee should have an htmlParse function (can’t
check right now as on phone) try that.

Mark Drew

  • Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com <javascript:>> wrote:

There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).

Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?


See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your
ticket NOW - http://www.cfcamp.org/

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+un...@googlegroups.com <javascript:>.
To post to this group, send email to lu...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

<Before.html>

<AfterACF10.html>

<AfterLucee.html>