robrighter/node-xml

Time to parse file is O(n^2) performance wrt to file size

Opened this issue · 3 comments

I have an xml file containing a table. The parsing speed is very slow. I notice a pattern though:

110s for parsing the full file
25s when half the rows are removed
Seems like textbook quadratic growth.

I also observe the per row processing time decrease as the parser advances through more rows; ie rows occurring later in the file take less time.

Just tested using fs.createReadStream to read the file in chunks; parsing the whole file now takes 10s (which is still bad for 450k in my opinion)

Adding this code:

var fc = this.m_xml.charAt(this.m_iP);
if (fc !== '<' && fc !== '&') {
    return this._parseText   (this.m_iP);        
}

at the top of XMLP.prototype._parse cuts down the time by half. I have a 1MB file which used to take 4+ minutes and now takes 2m 12s. Problem is there's lots of repeated scanning using indexOf which results in the O(n^2) performance.

pcapr is right on with this issue. I added some profiling and this lib took 3-4 sec to parse a small (170KB) xml file with 99% of the work being indexOf calls. Give the n^2 growth, this will lead to a performance nightmare for medium to large xmls. This is can be improved significantly by passing the xml in chunks rather than a big string, but we've had so many issues already that it's easier just to drop the library.