LoadHTML leads to wrong result when node begins with underscore
mausoma opened this issue · 2 comments
1. Description
Describe the issue or propose a feature.
var html = @"<_links>A</_links>";
var htmlInput = new HtmlAgilityPack.HtmlDocument();
htmlInput.LoadHtml(html);
Console.WriteLine(html);
Console.WriteLine(htmlInput.DocumentNode.OuterHtml);
LoadHTML doesn't load the XML correctly. The output is:
<_links>A</_links>
<_links>A
2. Exception
I would expect that the input and OuterHTML are (more or less) the same.
But the end tag of </_links> is missing completely.
4. Any further technical details
- HAP version: 1.11.54
- NET Framework 4.7.2
Hello @mausoma ,
HtmlAgilityPack
is an HTML
parser. A tag in HTML
cannot start with an underscore.
Best Regards,
Jon
Additionally to what Jonathan already pointed out, i want to point out that it is possible to check whether the input data has been successfully parsed without errors. To do so, check HtmlDocument.ParseErrors
after loading the input data for any reported error.
In your case, HtmlDocument.ParseErrors
should contain an error relating to the invalid <_links> element. However, unfortunately the reported error is a bit misleading, as it indicates "Start tag <_links> was not found" instead of an error referring to "_links" being an invalid element name. :-(
(P.S.: I am just a user and not affiliated with the HAP project or its maintainers.)