[fuzz result] code span vanishes when link destination is `
notriddle opened this issue · 5 comments
This markdown:
[link](`)`x`
In most engines I've tried with, including GitHub, it does this:
link
x
commonmark-hs generates this:
<p><a href="`">link</a>`x`</p>
Events from pulldown-cmark:
"[^](`)`|`\n" -> [
Start(Paragraph)
Start(Link { link_type: Inline, dest_url: Borrowed("`"), title: Borrowed(""), id: Borrowed("") })
Text(Borrowed("^"))
End(Link)
Code(Borrowed("|"))
End(Paragraph)
]
Events from pandoc:
"[^](`)`|`\n" -> [
Start(Paragraph)
Start(Link { link_type: Inline, dest_url: Inlined(InlineStr { inner: [96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], len: 1 }), title: Inlined(InlineStr { inner: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], len: 0 }), id: Borrowed("") })
Text(Boxed("^"))
End(Link)
Text(Boxed("`|`"))
End(Paragraph)
]
Events from commonmark.js:
"[^](`)`|`\n" -> [
Start(Paragraph)
Start(Link { link_type: Inline, dest_url: Inlined(InlineStr { inner: [96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], len: 1 }), title: Inlined(InlineStr { inner: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], len: 0 }), id: Borrowed("") })
Text(Boxed("^"))
End(Link)
Code(Boxed("|"))
End(Paragraph)
]
The code that looks like the culprit is here:
commonmark-hs/commonmark/src/Commonmark/Inlines.hs
Lines 840 to 843 in 1875e9a
In this case, the end paren is in the middle of a "parsed code" chunk, which gets split into the link destination and some plain text.
v end paren
[link](`)`x`
:^^^-- parsed chunk
:| code span
: suffix pos
The simplest solution is to make inline code bind tighter than link destinations, but this wouldn't match the reference implementation. The other simple options are to re-parse with the inline syntax parsers, or to interleave bracket matching with inline code span parsing.
None of those are simple and obvious fixes, unfortunately.
Reparsing seems ugly but maybe it's the best idea.
I'm not sure what the last option would involve -- it sounds like a pretty major architectural change, though!
The three options that come to my mind are:
- You can't have unescaped
`
in link destinations. This is the easiest, and doesn't match cmark.c. - Re-parsing is the least invasive, but still ugly, and I'm not convinced it's actually correct.
- Add another kind of Chunk, for
`
runs, then turn it into code spans inprocessBs
. This matches closest with how pulldown-cmark does it (MaybeCode
,MaybeLinkOpen
, andMaybeImage
are all prepared in the tree building class, thenhandle_inline_pass1
turns them into spans, thenhandle_emphasis_and_hard_break
... handles emphasis and hard line breaks in a second inline pass), and since it means there's a clean separation between "definitely a code span" and "a valid delimiter that might become a code span", it seems more likely to be correct.
- You can't have unescaped ` in link destinations. This is the easiest, and doesn't match cmark.c.
This would require a spec change.
As for options 2 and 3, I'm not sure. I agree that 2 is ugly. So 3 has some appeal, but I'd have to see what is actually involved in going this way.