CADbloke/daisydiff

Empty IMG tag throws NullPointerException

Opened this issue · 0 comments

What steps will reproduce the problem?
    Initiate a parse with a source containing an empty img tag ("<img>")

    Example code:
        String html = "<img>";
        NekoHtmlParser cleaner = new NekoHtmlParser();
        InputSource inputSource = new InputSource(new StringReader(html));
        DomTreeBuilder handler = new DomTreeBuilder();
        cleaner.parse(inputSource, handler);

What is the expected output? What do you see instead?
    Expected output is that the parse completes successfully and populates the DomTreeBuilder.
    Actual output is a NullPointerException in org.outerj.daisy.diff.html.dom.ImageNode.<init>

What version of the product are you using? On what operating system?
    Daisy Diff 1.2 with Java 6 and Java 7 on Linux and Window.


Please provide any additional information below.
    Adding a source attribute to the img tag causes it to parse correctly.

    We are parsing user-supplied data, sometimes pasted from other applications, which is how we wound up with an img tag with no attributes.

    Stack Trace:
java.lang.NullPointerException
    at org.outerj.daisy.diff.html.dom.ImageNode.<init>(Unknown Source)
    at org.outerj.daisy.diff.html.dom.DomTreeBuilder.endElement(Unknown Source)
    at org.outerj.daisy.diff.helper.MergeCharacterEventsHandler.endElement(Unknown Source)
    at org.outerj.daisy.diff.helper.NekoHtmlParser$RemoveNamespacesHandler.endElement(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
    at org.cyberneko.html.filters.DefaultFilter.emptyElement(DefaultFilter.java:148)
    at org.cyberneko.html.filters.NamespaceBinder.emptyElement(NamespaceBinder.java:302)
    at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:617)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2637)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2012)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:910)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.outerj.daisy.diff.helper.NekoHtmlParser.parse(Unknown Source)



Original issue reported on code.google.com by mejari on 31 Jan 2013 at 6:37