rezakho/ganon

Does not recognize <!DOCTYPE html> as open HTML tag

Opened this issue · 2 comments

What will reproduce the problem?
Trying to get nodes inside html tags if document uses html5

If 'file.html' starts with the HTML5 tag.
<!DOCTYPE html>
...
</html>

$html_node = $html('html', 0);
echo gettype($html_node);     // RETURNS NULL


However if the doc is declared with

<html>
...
</html>

it works as intended



What is the expected output? What do you see instead?


Which version are you using?


Please provide any additional information below.


Original issue reported on code.google.com by bruc...@gmail.com on 5 Dec 2012 at 8:56

Are you sure the first example is valid HTML?

http://www.w3schools.com/tags/tag_doctype.asp
http://dev.w3.org/html5/spec/single-page.html#the-doctype

"The <!DOCTYPE> declaration is not an HTML tag; it is an instruction to the web 
browser about what version of HTML the page is written in."

Do you want Ganon to try to recover the html node from the closing tag?

Original comment by niels....@gmail.com on 7 Dec 2012 at 5:53

Yes, my mistake. It's not an HTML tag per se.
However, it can still be valid. On these validators:

http://validator.w3.org/nu/
http://validator.w3.org/check

The following validates:
<!DOCTYPE html>
<head>
<title></title>
</head>
<body>
</body>
</html> 


So perhaps it would be nice for Ganon to parse "<!DOCTYPE html>" as an opening 
HTML tag nd make it the root node?

Original comment by bruc...@gmail.com on 8 Dec 2012 at 1:52