codehaus-plexus/plexus-utils

Uncaught IllegalArgumentException due to malformed unicode entity ref

rohanpadhye opened this issue · 2 comments

Sample Maven pom.xml below:

<project name="&#xFFFFFF;"></project>

0xFFFFFF is not a valid Unicode codepoint. This leads to the following uncaught exception arising from plexus when running mvn:

Caused by: java.lang.IllegalArgumentException
    at org.codehaus.plexus.util.xml.pull.MXParser.toChars (MXParser.java:4023)
    at org.codehaus.plexus.util.xml.pull.MXParser.parseEntityRef (MXParser.java:2727)
    at org.codehaus.plexus.util.xml.pull.MXParser.parseAttribute (MXParser.java:2522)
    at org.codehaus.plexus.util.xml.pull.MXParser.parseStartTag (MXParser.java:2218)
    at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog (MXParser.java:1801)
    at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl (MXParser.java:1698)
    at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1317)
    at org.apache.maven.model.io.xpp3.MavenXpp3ReaderEx.read (MavenXpp3ReaderEx.java:4417)
    at org.apache.maven.model.io.xpp3.MavenXpp3ReaderEx.read (MavenXpp3ReaderEx.java:598)
    at org.apache.maven.model.io.DefaultModelReader.read (DefaultModelReader.java:105)
    at org.apache.maven.model.io.DefaultModelReader.read (DefaultModelReader.java:82)

I'm guessing the expected behavior is to throw an XMLPullParserException instead, to signal an unparsable entity.

Found using JQF.

Note that if the entity ref is not a valid integer, then a NumberFormatException is thrown instead, from Integer.parseInt():

<project name="&#FFFFFFFFFFF;"></project>
Caused by: java.lang.NumberFormatException: For input string: "FFFFFFFFFFF"
    at java.lang.NumberFormatException.forInputString (NumberFormatException.java:65)
    at java.lang.Integer.parseInt (Integer.java:652)
    at org.codehaus.plexus.util.xml.pull.MXParser.parseEntityRef (MXParser.java:2727) 
    ...

However, PR #58 appears to fix this as well, since NumberFormatException is a subclass of IllegalArgumentException, and #58 catches the latter. Thanks @belingueres.

PR #58 merged