html5lib/html5lib-python

_inputstream.EncodingParser fails to parse short bogus comments (prescan)

openandclose opened this issue · 1 comments

I think abrupt-closing-of-empty-comment ('<!-->' and '<!--->')
should be treated as normal comments in prescan as well.

for '<!-->', the spec states rather explicitly.

The two 0x2D bytes can be the same as those in the '<!--' sequence.'

prescan:
https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding
abrupt-closing-of-empty-comment:
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment

C.f.
jsdom/html-encoding-sniffer has the same bug.
jsdom/html-encoding-sniffer#4
Validator treats them as comments.

Confirmed the spec says to treat them as comments.