commonmark/cmark

Odd parsing of `**A*B*C*`

Closed this issue · 3 comments

Three common-mark parsers parse **A*B*C* differently:

cmark          : `<em><em>A</em>B</em>C*`
commonmark.js  : `*<em>A<em>B</em>C</em>`
comrak         : `**A<em>B</em>C*`

I think (but could easily be wrong...) that the commonmark.js version is correct due to the reasoning outlined in kivikakk/comrak#217

  1. First we consider ** it can't end an emph, as there are no earlier ones.
  2. Next we consider the * in A*B. It is both left-and-right flanking and so can start or end emph.
    We look backwards from there and find the starting** - this can start emph, but we're not allowed to use it since
    the lengths of ** and * add to 3. - there are no earlier entries so we move on.
  3. Next we consider the* in B*C - again both left and right flanking. Searching backward we hit the* in A*B - the sum of the lengths is not 3 so we can use it. This means we now have **A<em>B</em>C*. We move on.
  4. FInally we reach the ending *. It is only right-flanking - so it can only end emph. Searching backward we find the initial **. The** can only start emph and the final *can only end emph, so the sum-to-3 issue does not occur - and they match, giving*<em>A<em>B</em>C</em>.
jgm commented

Yes, commonmark.js is correct. But cmark is correct too, at least the version I have:

% build/src/cmark 
**a*b*c*
<p>*<em>a<em>b</em>c</em></p>
jgm commented

This is with the current dev version, by the way.
I haven't tested other releases, but are you sure your cmark gives this result?

Oh - that's super embarrassing. I had pulled the most recent version but hadn't reset my HEAD to the new branch.
So I was testing with a 6-year-old (!) version of cmark. After updating to the right commit it all works fine.

Sorry, and thanks.