Odd parsing of `**A*B*C*`
Closed this issue · 3 comments
mikeando commented
Three common-mark parsers parse **A*B*C*
differently:
cmark : `<em><em>A</em>B</em>C*`
commonmark.js : `*<em>A<em>B</em>C</em>`
comrak : `**A<em>B</em>C*`
I think (but could easily be wrong...) that the commonmark.js version is correct due to the reasoning outlined in kivikakk/comrak#217
- First we consider
**
it can't end an emph, as there are no earlier ones.- Next we consider the
*
inA*B
. It is both left-and-right flanking and so can start or end emph.
We look backwards from there and find the starting**
- this can start emph, but we're not allowed to use it since
the lengths of**
and*
add to 3. - there are no earlier entries so we move on.- Next we consider the
*
inB*C
- again both left and right flanking. Searching backward we hit the*
inA*B
- the sum of the lengths is not 3 so we can use it. This means we now have**A<em>B</em>C*
. We move on.- FInally we reach the ending
*
. It is only right-flanking - so it can only end emph. Searching backward we find the initial**
. The**
can only start emph and the final*
can only end emph, so the sum-to-3 issue does not occur - and they match, giving*<em>A<em>B</em>C</em>
.
jgm commented
Yes, commonmark.js is correct. But cmark is correct too, at least the version I have:
% build/src/cmark
**a*b*c*
<p>*<em>a<em>b</em>c</em></p>
jgm commented
This is with the current dev version, by the way.
I haven't tested other releases, but are you sure your cmark gives this result?
mikeando commented
Oh - that's super embarrassing. I had pulled the most recent version but hadn't reset my HEAD to the new branch.
So I was testing with a 6-year-old (!) version of cmark. After updating to the right commit it all works fine.
Sorry, and thanks.