Abort/Cancel XmlParsing (using SAX)

I used this helpful guide to get started parsing XML with SAX XML fast And Easy :: Lucee Documentation

I need to parse a large XML document, so I’d like to be able to stop parsing part way through. Using Abort doesn’t seem to work.

The workaround suggested on some java sites is to throw a SAXException, which I’m having trouble doing. I tried creating a SAXError and throwing it, but I’m doing something wrong. Any help would be appreciated. Please see code below…

component {
	
	/**
	* Start parsing an xml file.
	* @param xmlFile xml file to parse
	*/
	function init(string xmlFile) {
		variables.xmlFile=arguments.xmlFile;
		variables.cnt = 0;
		variables.xmlEventParser=createObject("java","lucee.runtime.helpers.XMLEventParser");
	}

	function execute() {
		variables.xmlEventParser.init(
			getPageContext(),
			this.startDocument,
			this.startElement,
			this.body,
			this.endElement,
			this.endDocument,
			this.error
		);
		xmlEventParser.start(xmlFile);
	}


	/**
	* This function will be called on to start parsing an XML Element (Tag).
	*/
	public void function startElement(string uri, string localName, string qName, struct attributes) {
		variables.cnt++;
		if (variables.cnt eq 10) {

			//NONE OF THESE WORK FOR ME. 

			//this throws: "No matching Method/Function for lucee.runtime.helpers.XMLEventParser.error() found". But the method is there when object dumped
			//variables.xmlEventParser.error(); 

                        //I tried with with a SAXError arg, but not sure i'm doing it right. method still not found
			var SAXException = createObject("java", "org.xml.sax.SAXParseException");
                       //variables.xmlEventParser.error(SAXException);
			
                        //tried a regular cf throw with java object. no good.
			throw(object=SAXException);

		}
	}

	/**
	* call with body of content
	*/
	public void function body(string content) {}

	/**
	* This function will be called at the end of parsing an XML Element (Tag).
	*/
	public void function endElement(string uri, string localName, string qName, struct attributes) {}

	/**
	* This function will be called at the start of parsing a document.
	*/
	public void function startDocument() {}
	
	/**
	* This function will be called at the end of parsing a document.
	*/
	public void function endDocument() {}
	
	/**
	* This function will be called when a error occurs.
	*/
	public void function error(struct cfcatch) {
		echo(cfcatch);
	}
}

I don’t quite understand why you’re creating Java objects direct and not just using XMLParse() but throwing an exception from your code isn’t going to do anything at all. Presumably the XML parsing is a synchronous (blocking) operation, so your exception isn’t going run until after the flow control has returned to your thread. If you want to be able to stop the parsing, you’d need to start it in a separate thread and then “interrupt” the thread that’s doing the parsing. That, of course, assumes that the library in question is interruptible.

Thanks. I’m using the SAX library per this tutorial: XML fast And Easy :: Lucee Documentation instead of xmlParse because it seems more efficient when dealing with large files.

My cursory understanding is that SAX is an event based parser that doesn’t require you read the entire file into memory. Instead, as the file is read, events are triggered which are hooked to callbacks (see the constructor).

So you can, for instance, run code right before each node is parsed, including CF code (within the callbacks). I feel like I should be able to throw an exception within that callback code and it should interrupt the operation (because that callback code is part of the operation itself).

Here’s a java discussion of it: java - How to stop parsing xml document with SAX at any time? - Stack Overflow

In that example, they throw an exception based on some condition in the startElement callback (which is called right before a new XML node is parts, afaict). I’d like to do the same thing in CF, throw a SAX exception, but I’m not sure I’m doing it right.

Am i way off base?

Ahh, interesting-- I’ve never used the SAX listener stuff. I’m not sure how the exception works-- I’m guessing the SAX library catches that specific exception class. I don’t think there is a way to throw a specific Java exception in CFML, but to be honest, I’ve never tried. What happens when you throw your exception that you showed above? Does it stop the parsing? Create any logs?

I get: “No matching Method/Function for lucee.runtime.helpers.XMLEventParser.error(org.xml.sax.SAXException) found”

I just don’t really what I’m doing when it comes to java, so I don’t know if i’m just not creating the object correctly or what.

Anyway, it’s kind of an arcane issue that I should stop worrying about on a saturday night (tho when in quarantine, what else am i supposed to do!)

thanks for the help. stay safe! stay sane!

It would appear that method in the Lucee code only accepts a SAXParseException and not a SAXException.

	@Override
	public void error(SAXParseException e) throws SAXException {
		error(Caster.toPageException(e));
	}

I’m not sure what methods are getting called in which order since you didn’t include the stack trace. It does appear that Lucee is just following the Java interface defined for a SAX listener.

https://docs.oracle.com/javase/7/docs/api/org/xml/sax/helpers/DefaultHandler.html#error(org.xml.sax.SAXParseException)

You could try generating and throwing a SAXParseException instead, but to be honest I’m just sort of guessing at this point :slight_smile:

good catch. tried and failed. The error seems to get thrown right away, when the object is created. It does occur in the flow of the parser (the image shows the loop counter dumps with the error in the middle of them), but doesn’t stop the parser. Thanks for you help. I’m going to take a different route.