Exception throw if HTML element contains XMLNS attribute
cmwoods opened this issue · 2 comments
cmwoods commented
I'm getting the following exception thrown when the HTML element contains the XMLNS attribute (in XHTML document):
Unhandled Exception: System.ArgumentException: The namespace declaration attribute has an incorrect 'namespaceURI': ''.
at System.Xml.XmlDocument.AddAttrXmlName(String prefix, String localName, String namespaceURI, IXmlSchemaInfo schemaInfo)
at System.Xml.XmlDocument.CreateAttribute(String prefix, String localName, String namespaceURI)
at System.Xml.XmlElement.SetAttribute(String localName, String namespaceURI, String value)
at HtmlParserSharp.XmlTreeBuilder.CreateHtmlElementSetAsRoot(HtmlAttributes attributes) in [...]\HtmlParserSharp\TreeBuilders\XmlTreeBuilder.cs:line 120
at HtmlParserSharp.Core.TreeBuilder`1.AppendHtmlElementToDocumentAndPush(HtmlAttributes attributes) in [...]\HtmlParserSharp\Core\TreeBuilder.cs:line 5237
at HtmlParserSharp.Core.TreeBuilder`1.StartTag(ElementName elementName, HtmlAttributes attributes, Boolean selfClosing) in [...]\HtmlParserSharp\Core\TreeBuilder.cs:line 2775
at HtmlParserSharp.Core.Tokenizer.EmitCurrentTagToken(Boolean selfClosing, Int32 pos) in [...]\HtmlParserSharp\Core\Tokenizer.cs:line 1155
at HtmlParserSharp.Core.Tokenizer.StateLoop(TokenizerState state, Char c, Int32 pos, Char[] buf, Boolean reconsume, TokenizerState returnState, Int32 endPos) in [...]\HtmlParserSharp\Core\Tokenizer.cs:line 2249
at HtmlParserSharp.Core.Tokenizer.TokenizeBuffer(UTF16Buffer buffer) in [...]\HtmlParserSharp\Core\Tokenizer.cs:line 1382
at HtmlParserSharp.SimpleHtmlParser.Tokenize(TextReader reader) in [...]\HtmlParserSharp\SimpleHtmlParser.cs:line 134
at HtmlParserSharp.SimpleHtmlParser.Parse(TextReader reader) in [...]\HtmlParserSharp\SimpleHtmlParser.cs:line 63
It looks like the code is not particularly expecting XHTML input and therefore doesn't have a special case for the handling of this attribute.
cmwoods commented
Changed CreateHtmlElementSetAsRoot in XmlTreeBuilder.cs to have the following within the for loop (hack for error):
string uri = attributes.GetURI(i);
if (attributes.GetLocalName(i) == "xmlns" && string.IsNullOrWhiteSpace(uri))
{
uri = "http://www.w3.org/2000/xmlns/";
}
rv.SetAttribute(attributes.GetLocalName(i), uri, attributes.GetValue(i));
I don't know if this is actually the correct thing to do or not but it at least gets around the issue.