jamietre/HtmlParserSharp

XPath not working for SimpleHtmlParser resulting DOM Tree

cmwoods opened this issue · 1 comments

It would appear that the DOM Tree generated by the SimpleHtmlParser does not (for some reason I haven't been able to figure out) support any XPath functionality other than "//*".

I have a simple XHTML document that I load via SimpleHtmlParser's Parse method. I can walk the DOM Tree just fine but if I use SelectNodes() with anything but "//*" I end up with 0 results.

Loading the XHTML directly using XmlDocument's Load functionality results in a DOM that returns expected results for any given XPath query.

Appears that I found a work around... For the SimpleHtmlParser DOM Tree, I have to create an XmlNameSpaceManager instance, and add a namespace with a non-empty prefix to the "http://www.w3.org/1999/xhtml" namespace. Then I have to include that namespace and the namespacemanager in my SelectNodes call.

System.XML:

XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load(filepath); var Nodes = xmlDoc.SelectNodes("//tr"); // Yields expected non-zero result

vs. HtmlParserSharp:

SimpleHtmlParser parser = new SimpleHtmlParser(); XmlDocument xmlDoc2 = parser.Parse(filepath); XmlNamespaceManager nsmgr = new XmlNamespaceManager(xmlDoc2.NameTable); nsmgr.AddNamespace("html", "http://www.w3.org/1999/xhtml"); var Nodes = xmlDoc.SelectNodes("//html:tr"); // Now yields expected non-zero result

Little quirky. I guess that the parser engine is adding a namespace reference (as my XHTML source document doesn't have any)... At least I have a work around.