Parse an XML File

danielruppen · May 5, 2021, 4:57am

Hi

I struggle to walk over an XML file and extract data. The attached xml is an example file I try to walk over and I look for date in the table. In fact I’d like to loop between ticket numbers 54505586 and 57949164 and extract the data.

https://drive.google.com/file/d/14H_cuzLvQX3LCuYEslR-UETYOeTpLrnv/view?usp=sharing

Can anyone give me a hint please

Many thanks

Daniel

walterseethaler · May 5, 2021, 11:02am

In general:

Load the XML in a variable
Parse the XML - returns an “XML document object”
The XML document object represents the whole XML and can be used like structs and arrays.
Search the XML document object with xpath (XPath Syntax)
The search result can be used like the XML document object.

<cfscript>
    xmlData = "<foo><bar/></foo>";
    xmlObj = xmlParse(xmlData);
    result = xmlSearch(xmlObj, "//bar");
    dump(xmlObj);
    dump(result);
</cfscript>

danielruppen · May 5, 2021, 1:55pm

Thanks you Walter
First time I really deal with XML so I will give it a try and revert
Regards
Daniel

danielruppen · May 6, 2021, 2:21am

Hi Walter
hmm, still can’t find anything. If I upload my xml file to an xml parser and search for the first ticketnumber I get the following path back: object/html/body/div/table/tr
However using xmlSearch with that path returns an empy array. I can dump the whole xml and follow the path so it’s all there but xmlSearch doesn’t find my first ‘tr’

Regads
Daniel

danielruppen · May 6, 2021, 2:30am

This is very weird, I’ve tried so many options but nothing seem to work. So my steps are:

download html file
apply xmlParse that file
xmlSearch

The example of the file is in my first post

Regards
Daniel

walterseethaler · May 6, 2021, 11:22am

The easiest xpath to get all rows of your html data would be:

xmlSearch(xmlObj, "//tr");

In your example I assume object is wrong:

xmlSearch(xmlObj, "html/body/div/table/tr");

walterseethaler · May 6, 2021, 11:30am

If xmlSearch doesn’t work, you can still walk the XML document object.
As long as your html data has always the same structure it works.

<cfscript>
    //...
    xmlObj = xmlParse(xmlData);
    rows = xmlObj.html.body.div.table.xmlChildren;
    currentRow = 0;
    for(row in rows){
        currentRow++;
        //the first 4 rows are header
        if(currentRow < 5){ continue; }
        //row 5, 6 ...
        dump(row.xmlChildren[1]);
    }
    echo(xmlData);
</cfscript>

A gist with your data (had to replace # with ##).

walterseethaler · May 6, 2021, 11:41am

When I tried to use xmlSearch in the gist with your data, it returned nothing. The result was the same in Lucee and in the CF-Versions.

This may be a bug in CF or the XML-Libraries or a problem with your data or just a problem with the gist servers. Don’t know.

cfmitrah · May 6, 2021, 11:56am

And this works with XMLsearch()

<cfset xmlFile = fileread(expandpath("./statement.xml"))>
<cfset res = xmlParse(xmlFile)>
<cfset ser = XmlSearch(res,'//*[local-name()="table"]')> <!--- Returns table --->
<cfset serTr = XmlSearch(ser[1],'//*[local-name()="tr"]')> <!--- Returns tr --->
<cfdump var="#serTr#">

danielruppen · May 7, 2021, 1:25am

Thanks Walter and Raymond

XmlSearch(ser[1],‘//*[local-name()=“tr”]’ was the solution. Not sure why my htm/xml file doesn’t react like normal but now I can access the data and my import works without using manual workaround (open in Excel and save as csv)

Many thanks!!
Daniel