There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.
Images are missing…Il giorno sabato 15 agosto 2015 01:10:34 UTC+2, Marilou Landes ha scritto:
There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.
There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).
You didn’t include code to demonstrate how you’re processing the HTML
file, which is pretty key to this.
However this will do it:
dir = expandPath(“./”);
platform = structKeyExists(server, “lucee”) ? “lucee” : “coldFusion”;
xml = xmlParse(“#dir#before.html”);
fileWrite(“#dir##platform#.xml”, toString(xml));
Running that on CF11 and Lucee 4.5, I am seeing the same thing you are.
Suggestions for resolving this?
File a bug.
Await someone to fix it, or
Fix it yourself.
Lucee’s XML parsing just doesn’t seem to work in this situation.On Saturday, 15 August 2015 00:10:34 UTC+1, Marilou Landes wrote:
Instead of doing xmlParse lucee should have an htmlParse function (can’t check right now as on phone) try that.
Mark Drew
Sent by typing with my thumbs.> On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes> wrote:
There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I’ve attached a simple HTML document that is read using xmlParse( ). I’ve also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has not properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
The images aren’t really relevant. It’s the contents of the files the OP is
meaning for us to look at.On Saturday, 15 August 2015 01:16:32 UTC+1, Francesco Pepe wrote:
Anyway, I didn’t miss the overall point. there seems to be a bug you are
right, but I gave an option on how to get round it.
The source is HTML in Before.html might have closing tags but it is
NOT marked as proper XHTML (it’s marked as HTML5) so it gets processed
by (I am guessing here TagSoup? ) as a non XML variant right?
Not marked as
And if you DO mark the BEFORE.html as proper XHTML it then XMLparses
correctly as:
So in the first case you are saying “hey lucee, xmlparse this HTML5 doc,
which doesnt need stuff to be closed” it then parses it following the
rules of HTML5
If you properly mark it up as XHTML then it DOES parse it with those
rules. Basically lucee is doing what you asked it to.
16 August 2015 10:25
You are somewhat missing the point, Mark. The source document
/is/ XML. The dialect of mark-up the file contains might be XHTML, but
it’s well-formed and it is valid XML. Look at it.
So all the operation is doing is taking some XML, converting it to a
Lucee XML object, then turning it back into XML again.
And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not
lie with the initial parsing.
On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
Maintaining the actual format of HTML even though you used the
xmlParse, in an (odd?) way I can see why it would do this.
The output of the htmlParsing gives you a more XML like output:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns=“XHTML namespace” lang=“en”>XMLParse and toString
Test
XMLParse and toString Test
Which I think is more akin to what you were expecting (closing the br
and img tags)
Hope that helps
Regards
Mark Drew
Marilou Landes mailto:Marilou_Landes
16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like need to be closed properly, as in
.
On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:
There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
so you are using XMLParse to turn a non-XML document into an XML
document? (the to String is just serializing it)
Marilou Landes wrote:>
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like need to be closed properly, as in
.
On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
Instead of doing xmlParse lucee should have an htmlParse function
(can’t check right now as on phone) try that.
Mark Drew
Sent by typing with my thumbs.
On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com <javascript:>> wrote:
There is a difference between the ACF10 toString( value ) output
and Lucee toString( value ) output when the value is XML. I’ve
attached a simple HTML document that is read using xmlParse( ).
I’ve also attached 2 additional files, that reprersent the output
from ACF10 and Lucee (FINAL 4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and
has not properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
–
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany -
Get your ticket NOW - http://www.cfcamp.org/
You are somewhat missing the point, Mark. The source document is XML. The
dialect of mark-up the file contains might be XHTML, but it’s well-formed
and it is valid XML. Look at it.
So all the operation is doing is taking some XML, converting it to a Lucee
XML object, then turning it back into XML again.
And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not lie
with the initial parsing.On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:
I did the following example, using your test files and I got the output
that you might expect
That’s the source of before.html according to my machine. Are you seeing
something different?On 16 Aug 2015 20:39, “Adam Cameron” <@Adam_Cameron> wrote:
On Sunday, 16 August 2015 10:51:31 UTC+1, James Holmes wrote:
I looked at it. I see this:
<META http-equiv="Content-Type"
content=“text/html; charset=utf-8”>
XMLParse and
toString Test
That is not valid XML.
No, that isn’t. But… where did you get that from? It’s not the doc
under discussion here.
–
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your
ticket NOW - http://www.cfcamp.org/
Could be. I think it might be the sax parser but that is as far as I got looking into the XMLUtil and parseXml funcs.
Mark Drew
Sent by typing with my thumbs.> On 16 Aug 2015, at 11:47, Adam Cameron <@Adam_Cameron> wrote:
On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:
The source is HTML in Before.html might have closing tags but it is NOT marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am guessing here TagSoup? ) as a non XML variant right?
Whatever it’s doing, it’s predicated on the outer tags being . If I just change those to be - even if I leave the - then it behaves as one would expect it to.
So it does seem like it’s performing some unsolicited guess work here.
–
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
On 16 August 2015 at 19:25, Adam Cameron <@Adam_Cameron> wrote:
You are somewhat missing the point, Mark. The source document is XML.
The dialect of mark-up the file contains might be XHTML, but it’s
well-formed and it is valid XML. Look at it.
So all the operation is doing is taking some XML, converting it to a Lucee
XML object, then turning it back into XML again.
And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not lie
with the initial parsing.
On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:
I did the following example, using your test files and I got the output
that you might expect
That’s the source of before.html according to my machine. Are you seeing
something different?
Yup. The file contains this:
XMLParse and toString Test
XMLParse and toString Test
That’s from the actual file, not from browsing to it and looking at the
source. I can only presume your browser is showing you a back-working of
its parsed DOM document, not the original file contents.On Sunday, 16 August 2015 13:01:30 UTC+1, James Holmes wrote:
The source is HTML in Before.html might have closing tags but it is NOT
marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am
guessing here TagSoup? ) as a non XML variant right?
Whatever it’s doing, it’s predicated on the outer tags being . If I
just change those to be - even if I leave the - then
it behaves as one would expect it to.
So it does seem like it’s performing some unsolicited guess work here.On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:
16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like need to be closed properly, as in
.
On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:
There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
The source is HTML in Before.html might have closing tags but it is NOT
marked as proper XHTML (it’s marked as HTML5) so it gets processed by (I am
guessing here TagSoup? ) as a non XML variant right?
Sorry, you’re fixating on a tangential explantatory comment I made, which
overall is not (well ought not) be relevant. I was using the erm “XHTML”
merely as an explanation that HTML can indeed also be XML. XHTML has more
rules beyond just that, as you point out.
xmlParse() should not be guessing at dialects, it should be doing what it’s
told: here’s some XML… parse it.
I slightly misspoke before though. I should have restricted my description
of the doc’s appropriateness to the notion of “well-formed”, not “valid”.
On reading the RFC, these are two different things (I did not realise there
was this distinction, nor that those terms are meaningful in the context of
XML). XML is “well-formed” if its tags balance out etc. But to be “valid”
it needs to have a doctype and a DTD (although I suspect these days a
schema would also be OK in lieu of at DTD… I’m only reading this thing
superficially).
Now… xmlParse() could have been implemented to reject any non-VALID XML,
however I think we can all agree that’s unnecessarily restrictive. Who here
has really spent much time ensuring their XML has a DTD, and the XML
follows it? I actually have in the past, but not for a very long time as it
all seems like a lot of work for little real-world gain. Similarly with
schemas. Anyhow, Adobe took the pragmatic route and decided “well-formed”
was the requirement for an XML string to be parseable. This is the cue
Railo and then Lucee should (and as far as I can tell: did) take.
If an XML string has DTD information, then the XML must need to comply with
the DTD. Same with a schema.
However the XML parser should not make unsolicited guesses as to what the
dialect of the XML is. If no such information is provided, then no such
information should be inferred. And if Lucee is doing that (as you
speculate? it might be), it is wrong to do so.On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:
So all the operation is doing is taking some XML, converting it to a
Lucee XML object, then turning it back into XML again.
And Lucee is ballsing up the last step. That is the issue here: the
toString() function is not emitting valid XML. The problem does not
lie with the initial parsing.
On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
Maintaining the actual format of HTML even though you used the
xmlParse, in an (odd?) way I can see why it would do this.
The output of the htmlParsing gives you a more XML like output:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns=“XHTML namespace” lang=“en”>XMLParse and toString
Test
XMLParse and toString Test
Which I think is more akin to what you were expecting (closing the br
and img tags)
Hope that helps
Regards
Mark Drew
Marilou Landes mailto:Marilou_Landes
16 August 2015 03:55
I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like need to be closed properly, as in
.
On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
On 15 Aug 2015, at 00:10, Marilou Landes <@Marilou_Landes mailto:Marilou_Landes> wrote:
There is a difference between the ACF10 toString( value ) output and
Lucee toString( value ) output when the value is XML. I’ve attached
a simple HTML document that is read using xmlParse( ). I’ve also
attached 2 additional files, that reprersent the output from ACF10
and Lucee (FINAL 4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has
not properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
I will look at the htmlParse function, but the issue is really with the
toString(), whereby the xml document object is rewritten as a string. Null
elements, like need to be closed properly, as in .On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
Instead of doing xmlParse lucee should have an htmlParse function (can’t
check right now as on phone) try that.
Mark Drew
Sent by typing with my thumbs.
On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com <javascript:>> wrote:
There is a difference between the ACF10 toString( value ) output and Lucee
toString( value ) output when the value is XML. I’ve attached a simple
HTML document that is read using xmlParse( ). I’ve also attached 2
additional files, that reprersent the output from ACF10 and Lucee (FINAL
4.5.2.007).
Lucee output seems to be missing the first line (<?xml …) and has not
properly closed the META, LINK, BR and IMG tags.
Suggestions for resolving this?
–
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your
ticket NOW - http://www.cfcamp.org/