Odd parsing of `**ABC*`

Question

Odd parsing of `**ABC*`

Closed this issue 2 years ago · 3 comments

Three common-mark parsers parse **A*B*C* differently:

cmark          : `<em><em>A</em>B</em>C*`
commonmark.js  : `*<em>A<em>B</em>C</em>`
comrak         : `**A<em>B</em>C*`

I think (but could easily be wrong...) that the commonmark.js version is correct due to the reasoning outlined in kivikakk/comrak#217

First we consider ** it can't end an emph, as there are no earlier ones.

Next we consider the * in A*B. It is both left-and-right flanking and so can start or end emph.
We look backwards from there and find the starting** - this can start emph, but we're not allowed to use it since
the lengths of ** and * add to 3. - there are no earlier entries so we move on.

Next we consider the* in B*C - again both left and right flanking. Searching backward we hit the* in A*B - the sum of the lengths is not 3 so we can use it. This means we now have **ABC*. We move on.

FInally we reach the ending *. It is only right-flanking - so it can only end emph. Searching backward we find the initial **. The** can only start emph and the final *can only end emph, so the sum-to-3 issue does not occur - and they match, giving*ABC.

Answer 1 · 2022-06-24T16:25:38.000Z

Yes, commonmark.js is correct. But cmark is correct too, at least the version I have:

% build/src/cmark 
**a*b*c*
<p>*<em>a<em>b</em>c</em></p>

Answer 2 · 2022-06-24T16:26:35.000Z

This is with the current dev version, by the way.
I haven't tested other releases, but are you sure your cmark gives this result?

Answer 3 · 2022-06-25T00:06:47.000Z

Oh - that's super embarrassing. I had pulled the most recent version but hadn't reset my HEAD to the new branch.
So I was testing with a 6-year-old (!) version of cmark. After updating to the right commit it all works fine.

Sorry, and thanks.