rkalla/simple-java-xml-parser

Attributes with length 1 are rejected by parser

Opened this issue · 4 comments

I am parsing XML from OpenStreetMap. It looks like this:

<tag k="highway" v="residential"/>

The parser chokes when parsing the "k" attribute.

I set up my rule as:

IRule wayTagRule=new DefaultRule(IRule.Type.ATTRIBUTE, "/osm/way/tag","k","v") 

I get

com.thebuzzmedia.sjxp.XMLParserException: local name for rule looks to be missing for IRule: scamsoft.randomwalk.mapfetch.MapBuilder$4[type=ATTRIBUTE, locationPath=/osm/way/tag, attributeNames=k,v]
    at com.thebuzzmedia.sjxp.XMLParser.doStartTag(XMLParser.java:714)
    at com.thebuzzmedia.sjxp.XMLParser.doParse(XMLParser.java:579)
    at com.thebuzzmedia.sjxp.XMLParser.parse(XMLParser.java:457)
    at com.thebuzzmedia.sjxp.XMLParser.parse(XMLParser.java:297)

I think this is a simple "off by 1" error.
XMLParser.java at line 713 should be:

if (attrName.length() - startIndex <= 0)
    throw new XMLParserException(
         "local name for rule looks to be missing for IRule: "
        + rule);

I am using the current stable version (2.2)

First, I'd like to thank you for the very detailed bug report (and even digging down to try and find the bug).

Second, can you attach the exact XML you are parsing and pertinent parsing code you are testing with -- if the code is all part of a bigger class and would be a pain to refactor out, than just the example XML would be really helpful.

This is an example XML that produces the problem:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.0.2">
<way id="4588689" user="swanilli" uid="104101" visible="true" version="4" changeset="2832346" timestamp="2009-10-13T05:31:35Z">
  <nd ref="29023907"/>
  <nd ref="29023908"/>
  <tag k="bicycle" v="no"/>
  <tag k="foot" v="yes"/>
  <tag k="highway" v="path"/>
  <tag k="name" v="Seventh Avenue"/>
  <tag k="surface" v="unpaved"/>
 </way>
 </osm>

Put it in a file called "badData.xml" and call this:

    public static void main(String[] args) {
        try {
            IRule wayTagRule=new DefaultRule(IRule.Type.ATTRIBUTE, "/osm/way/tag","k","v");
            XMLParser parser=new XMLParser(wayTagRule);
            parser.parse(new FileInputStream(new File("badData.xml")));
        }
        catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

I get

Exception in thread "main" com.thebuzzmedia.sjxp.XMLParserException: local name for rule looks to be missing for IRule: com.thebuzzmedia.sjxp.rule.DefaultRule[type=ATTRIBUTE, locationPath=/osm/way/tag, attributeNames=k,v]
    at com.thebuzzmedia.sjxp.XMLParser.doStartTag(XMLParser.java:714)
    at com.thebuzzmedia.sjxp.XMLParser.doParse(XMLParser.java:579)
    at com.thebuzzmedia.sjxp.XMLParser.parse(XMLParser.java:457)
    at com.thebuzzmedia.sjxp.XMLParser.parse(XMLParser.java:297)

That seems to have fixed it for me.

btw this parser is SUPER FAST. I just converted a XOM project to sjxp and (now I got it working) it is TEN TIMES faster. Awesome. You should be screaming this from your front page. "sjxp is F***ing FAST!" I only tried this parser because someone suggested it elsewhere on the net... your front page didn't really give me much idea of the extra speed I would get.

btw I also like the API. Nice simple design.

I really appreciate the kind words!

I spent weeks profiling and tuning to make sure the library was painfully fast and created next to no garbage at runtime so it could run non-stop in an server-side process (e.g. an indexer).