commonmark_x reader mishandles ~~strikeout~~
dubiousjim opened this issue · 4 comments
Explain the problem.
printf 'abc ~~struck~~ def' | pandoc -f$READER
will correctly produce <p>abc <del>struck</del> def</p>
when READER
is commonmark+strikeout
or gfm
. But when READER
is commonmark_x
, it instead gives <p>abc <sub><sub>struck</sub></sub> def</p>
.
commonmark_x
says it includes the +strikeout
extension, but just in case, I verified that commonmark_x+strikeout
also gives the wrong output.
Pandoc version?
pandoc 2.17.1.1, binary release from the GitHub releases. On Mac OS 10.15.7.
This is partly due to a problem in pandoc: we currently give priority to subscript; this can be fixed by moving the strikethroughSpec below the subscriptSpec in the list of extensions in T.P.Readers.CommonMark. Unfortunately, when we do that, subscript no longer works at all. The reason has to do with the architecture of the commonmark library, which assumes that a single syntax spec will handle both single and double delimiter matchings of a certain kind (here, ~
). Instead, we have two handlers for ~
. The subscript handler specifies a fallback behavior if there is a "double delimiter" match (~~..~~
), and the strikethrough species a fallback behavior if there is a single delimiter match. The fallback is just to pass through the delimiters literally. So, once strikethrough is enabled and given priority over subscript, ~hi~
parses as ~hi~
rather than <sub>hi</sub>
. This can be illustrated using the commonmark
executable from commonmark-hs:
% commonmark -xsubscript -xstrikethrough
~~hi~~ ~hi~
<p><del>hi</del> ~hi~</p>
% commonmark -xstrikethrough -xsubscript
~~hi~~ ~hi~
<p><sub><sub>hi</sub></sub> <sub>hi</sub></p>
I think I'll move this over to commonmark-hs, because we need to fix this issue there before doing anything with pandoc.
If you build pandoc HEAD from source, this will now be fixed.
Thanks for the quick attention!