MXParser tokenization fails when PI is before first tag
belingueres opened this issue · 3 comments
Consider these valid XML documents:
<?a?>
<test>nnn</test>
and
<?xml version="1.0" encoding="UTF-8"?>
<?a?>
<test>nnn</test>
Those tests fail then parsing the PI, returning instead START_DOCUMENT.
@Test
public void testProcessingInstructionTokenizeBeforeFirstTag()
throws Exception
{
String input = "<?a?><test>nnn</test>";
MXParser parser = new MXParser();
parser.setInput( new StringReader( input ) );
assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
assertEquals( XmlPullParser.START_TAG, parser.nextToken() );
assertEquals( XmlPullParser.TEXT, parser.nextToken() );
assertEquals( XmlPullParser.END_TAG, parser.nextToken() );
}
@Test
public void testProcessingInstructionTokenizeAfterXMLDeclAndBeforeFirstTag()
throws Exception
{
String input = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><?a?><test>nnn</test>";
MXParser parser = new MXParser();
parser.setInput( new StringReader( input ) );
assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
assertEquals( XmlPullParser.PROCESSING_INSTRUCTION, parser.nextToken() );
assertEquals( XmlPullParser.START_TAG, parser.nextToken() );
assertEquals( XmlPullParser.TEXT, parser.nextToken() );
assertEquals( XmlPullParser.END_TAG, parser.nextToken() );
}
I think the problem is the parsePI() method: it returns false when parsing the xml declaration (<?xml ...?>
), which sets the event to PROCESSING_INSTRUCTION, but returns true when encounters another PI, which set the event to START_DOCUMENT. The logic should be exactly the inverse.
However this breaks several tests (and surely several apps).
A middle point may be to always return false from parsePI() so that always return PROCESSING_INSTRUCTION? WDYT?
Just tested, tests still fail.
@belingueres @michael-o I also come up with the same fix. Also, according to the javadoc for START_DOCUMENT
, this event should only be returned before the parser actually parses anything, which goes in the same way.
The PR does that and fixes both tests.