FasterXML/aalto-xml

AsyncByteScanner.validPublicIdChar() incorrectly rejects digits

dmolesUC opened this issue · 1 comments

The javadoc for AsyncByteScanner.validPublicIdChar() references PubidLiteral in the XML 1.0 specification:

PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
PubidChar    ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

Note that this includes [0-9]. However, the implementation of the method does not:

    protected boolean validPublicIdChar(int c) {
        return
            c == 0xA ||                     //<LF>
            c == 0xD ||                     //<CR>
            c == 0x20 ||                    //<SPACE>
            (c >= '@' && c <= 'Z') ||       //@[A-Z]
            (c >= 'a' && c <= 'z') ||
            c == '!' ||
            (c >= 0x23 && c <= 0x25) ||     //#$%
            (c >= 0x27 && c <= 0x2F) ||     //'()*+,-./
            (c >= ':' && c <= ';') ||
            c == '=' ||
            c == '?' ||
            c == '_';
    }

Note also that com.fasterxml.aalto.util.XmlCharTypes.PUBID_CHARS correctly includes these digits.

Steps to reproduce:

  1. Find or create a document matching the Encoded Archival Description version 3 schema and containing the following <!DOCTYPE> declaration (e.g., add it to this file):
<!DOCTYPE ead PUBLIC "+// http://ead3.archivists.org/schema/ //DTD ead3 (Encoded Archival Description (EAD) Version 3)//EN" "ead3.dtd">
  1. Attempt to parse it with an AsyncXMLStreamReader.

Expected:

  • File parses, or at any rate gets past the <!DOCTYPE> declaration.

Actual:

  • parsing fails with a WFCException:
Error parsing XML stream
com.fasterxml.aalto.WFCException: Unexpected character '3' (code 51) in prolog (not valid in PUBLIC ID)
 at [row,col {unknown-source}]: [1,77]
	at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333)
	at com.fasterxml.aalto.in.XmlScanner.throwUnexpectedChar(XmlScanner.java:1498)
	at com.fasterxml.aalto.in.XmlScanner.reportPrologUnexpChar(XmlScanner.java:1358)
	at com.fasterxml.aalto.async.AsyncByteBufferScanner.parseDtdId(AsyncByteBufferScanner.java:1946)
	at com.fasterxml.aalto.async.AsyncByteBufferScanner.handleDTD(AsyncByteBufferScanner.java:1833)
	at com.fasterxml.aalto.async.AsyncByteBufferScanner.handlePrologDeclStart(AsyncByteBufferScanner.java:1264)
	at com.fasterxml.aalto.async.AsyncByteBufferScanner.nextFromProlog(AsyncByteBufferScanner.java:1067)
	at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)

Thank you for reporting this & contributing fix!