jgm/commonmark-hs

commonmark_x reader mishandles ~~strikeout~~

dubiousjim opened this issue · 4 comments

Explain the problem.

printf 'abc ~~struck~~ def' | pandoc -f$READER

will correctly produce <p>abc <del>struck</del> def</p> when READER is commonmark+strikeout or gfm. But when READER is commonmark_x, it instead gives <p>abc <sub><sub>struck</sub></sub> def</p>.

commonmark_x says it includes the +strikeout extension, but just in case, I verified that commonmark_x+strikeout also gives the wrong output.

Pandoc version?
pandoc 2.17.1.1, binary release from the GitHub releases. On Mac OS 10.15.7.

jgm commented

This is partly due to a problem in pandoc: we currently give priority to subscript; this can be fixed by moving the strikethroughSpec below the subscriptSpec in the list of extensions in T.P.Readers.CommonMark. Unfortunately, when we do that, subscript no longer works at all. The reason has to do with the architecture of the commonmark library, which assumes that a single syntax spec will handle both single and double delimiter matchings of a certain kind (here, ~). Instead, we have two handlers for ~. The subscript handler specifies a fallback behavior if there is a "double delimiter" match (~~..~~), and the strikethrough species a fallback behavior if there is a single delimiter match. The fallback is just to pass through the delimiters literally. So, once strikethrough is enabled and given priority over subscript, ~hi~ parses as ~hi~ rather than <sub>hi</sub>. This can be illustrated using the commonmark executable from commonmark-hs:

% commonmark -xsubscript -xstrikethrough
~~hi~~ ~hi~
<p><del>hi</del> ~hi~</p>
% commonmark -xstrikethrough -xsubscript
~~hi~~ ~hi~
<p><sub><sub>hi</sub></sub> <sub>hi</sub></p>
jgm commented

I think I'll move this over to commonmark-hs, because we need to fix this issue there before doing anything with pandoc.

jgm commented

If you build pandoc HEAD from source, this will now be fixed.

Thanks for the quick attention!