commonmark/cmark

Incorrect emphasis handling

Closed this issue · 1 comments

mity commented

(Distilled from https://talk.commonmark.org/t/i-dont-understand-how-emphasis-is-parsed/3866)

Input:

*****Hello*world****

Actual Output:

<p>*****Hello<em>world</em>***</p>

Expected Output:

<p>**<em><strong>Hello<em>world</em></strong></em></p>

More detailed rationale can be found in this comment: https://talk.commonmark.org/t/i-dont-understand-how-emphasis-is-parsed/3866/8

jgm commented

Reading the algorithm at the end of the spec, I think I see the issue. We have an openers_bottom table that limits how far back you have to look for an opener. It is indexed to the type of delimiter (_, *) and the length of the closing delimiter mod 3. So after we fail to match the opener ***** to *, we set the openers_bottom for (*, 1) to the location of *, effectively removing the ***** as a possible opener for any run of *s with a length mod 3 of 1, including the final **** in this example. This procedure ignores the fact that the length mod 3 thing only matters if one of the delimiters can be both an opener and a closer.J