commonmark/commonmark-spec

(Intentional?) inconsistency between 4.6 block HTML and 6.6 raw HTML comments

wooorm opened this issue · 10 comments

The block HTML algorithm here allows <!-->, <!--->, etc, as comments.
These comments are also fine by the HTML parser (13.2.5.44, case for U+002D).
(note there are a couple of cases such as <!> and <!-> which HTML also allows but sees as parse errors, I am not talking about these).

The “inline” algorithm here does not allow <!--> or <!--->. They look a lot like comments, so I don’t really expect people to depend on these characters to be text. And it’s inconsistent with blocks. Can we change the spec to allow them?

I can do the work

jgm commented

Yes, I'm in favor.

Good to hear! One thing that I was wondering: -- in a comment is the same. For example, <!-- some stuff -- some more stuff -->. OK too?

jgm commented

If I recall, we deliberately simplified the comment parsing (even though this diverts from HTML standard). I don't remember why, though. I'm okay with implementing something more standard as long as it doesn't increase complexity too much, both in the spec and in parsers.

I wouldn’t know why that was the case! Perhaps if you care more about XML than HTML?

In my case, this just removes states in my state machine that are needed for inline, but not for block.
I can see -- in comments being used by humans, so that might even be considered a bug fix.

jgm commented

For reference, the HTML5 spec for comments:
https://html.spec.whatwg.org/multipage/syntax.html#comments

Thanks for merging this, John!

jgm commented

Reopening until we get the issue of <!--> and <!---> (not to mention <!-- hi -->) sorted out. See comments on linked PR.

jgm commented

I think an inconsistency between the block and inline cases is okay, given that the spec for block HTML allows invalid HTML.

jgm commented

However, allowing -- inside HTML comments is a change worth making.

commented in the PR: #713 (comment).