AsyncByteScanner.validPublicIdChar() incorrectly rejects digits
dmolesUC opened this issue · 1 comments
dmolesUC commented
The javadoc for AsyncByteScanner.validPublicIdChar()
references PubidLiteral in the XML 1.0 specification:
PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
Note that this includes [0-9]
. However, the implementation of the method does not:
protected boolean validPublicIdChar(int c) {
return
c == 0xA || //<LF>
c == 0xD || //<CR>
c == 0x20 || //<SPACE>
(c >= '@' && c <= 'Z') || //@[A-Z]
(c >= 'a' && c <= 'z') ||
c == '!' ||
(c >= 0x23 && c <= 0x25) || //#$%
(c >= 0x27 && c <= 0x2F) || //'()*+,-./
(c >= ':' && c <= ';') ||
c == '=' ||
c == '?' ||
c == '_';
}
Note also that com.fasterxml.aalto.util.XmlCharTypes.PUBID_CHARS
correctly includes these digits.
Steps to reproduce:
- Find or create a document matching the Encoded Archival Description version 3 schema and containing the following
<!DOCTYPE>
declaration (e.g., add it to this file):
<!DOCTYPE ead PUBLIC "+// http://ead3.archivists.org/schema/ //DTD ead3 (Encoded Archival Description (EAD) Version 3)//EN" "ead3.dtd">
- Attempt to parse it with an
AsyncXMLStreamReader
.
Expected:
- File parses, or at any rate gets past the
<!DOCTYPE>
declaration.
Actual:
- parsing fails with a
WFCException
:
Error parsing XML stream
com.fasterxml.aalto.WFCException: Unexpected character '3' (code 51) in prolog (not valid in PUBLIC ID)
at [row,col {unknown-source}]: [1,77]
at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333)
at com.fasterxml.aalto.in.XmlScanner.throwUnexpectedChar(XmlScanner.java:1498)
at com.fasterxml.aalto.in.XmlScanner.reportPrologUnexpChar(XmlScanner.java:1358)
at com.fasterxml.aalto.async.AsyncByteBufferScanner.parseDtdId(AsyncByteBufferScanner.java:1946)
at com.fasterxml.aalto.async.AsyncByteBufferScanner.handleDTD(AsyncByteBufferScanner.java:1833)
at com.fasterxml.aalto.async.AsyncByteBufferScanner.handlePrologDeclStart(AsyncByteBufferScanner.java:1264)
at com.fasterxml.aalto.async.AsyncByteBufferScanner.nextFromProlog(AsyncByteBufferScanner.java:1067)
at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790)
at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
cowtowncoder commented
Thank you for reporting this & contributing fix!