FasterXML/aalto-xml

Can't use ENTITY_REFERENCE event for resolution in an Attribute

otcdlink-simpleuser opened this issue · 3 comments

I need to resolve custom XML entities in some custom event handler/function. Sadly, an unknown entity doesn't trigger the XMLEvent.ENTITY_REFERENCE event. If this feature is not implemented, is there any workaround?

Here is a test case showing what I expect from Aalto. I'm using aalto-xml 1.1.0.

package com.otcdlink.chiron.wire;

import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import java.io.StringReader;

import static org.junit.jupiter.api.Assertions.fail;


public class StaxPlayground {

  @Test
  void entityReplacement() throws XMLStreamException {
    final XMLInputFactory xmlInputFactory =
        new com.fasterxml.aalto.stax.InputFactoryImpl() ;
//        javax.xml.stream.XMLInputFactory.newInstance() ;

    xmlInputFactory.setProperty( XMLInputFactory.SUPPORT_DTD, true ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_COALESCING, false ) ;

    final String xml =
        "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
        "<whatever x='&replace-me;' />"
//        "<whatever x='no-entity' />"
    ;

    final XMLStreamReader xmlStreamReader =
        xmlInputFactory.createXMLStreamReader( new StringReader( xml ) ) ;

    found: {
      while( xmlStreamReader.hasNext() ) {
        final int staxEvent = xmlStreamReader.next() ;
        if( staxEvent == XMLEvent.ENTITY_REFERENCE ) {
          LOGGER.info( "Got entity reference '" + xmlStreamReader.getLocalName() + "'." ) ;
          break found ;
        }
      }
      fail( "Found no entity reference." ) ;
    }
  }


// =======
// Fixture
// =======

  private static final Logger LOGGER = LoggerFactory.getLogger( StaxPlayground.class ) ;
}

All I get is an exception:

com.fasterxml.aalto.WFCException: Unexpanded ENTITY_REFERENCE (replace-me) in attribute value
 at [row,col {unknown-source}]: [2,26]

	at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333)
	at com.fasterxml.aalto.in.XmlScanner.reportUnexpandedEntityInAttr(XmlScanner.java:1343)
	at com.fasterxml.aalto.in.ReaderScanner.collectValue(ReaderScanner.java:901)
	at com.fasterxml.aalto.in.ReaderScanner.handleStartElement(ReaderScanner.java:794)
	at com.fasterxml.aalto.in.ReaderScanner.nextFromProlog(ReaderScanner.java:236)
	at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790)
	at com.otcdlink.chiron.wire.StaxPlayground.entityReplacement(StaxPlayground.java:44)

I just figured out that ENTITY_REFERENCE does work inside an Element's text. When the test case parses "<whatever>&replace-me;</whatever>" the ENTITY_REFERENCE happens.

Is there any way to hook on entity resolution when parsing an Attribute?

I'm looking at ReaderScanner's code around line 1066 and 897 and obviously the parser wants such an undefined entity to fail. Sounds like bad news for me.

OK I got it. I should ask for unresolved entities and resolve them on my own instead of relying on ENTITY_REFERENCE since it's probably not supposed to work with Attributes.

The problem is, disabling entity resolution is not yet possible. I'm opening another issue for that.