Markdown: Awkward soft break after abbreviation between ( and newline
fiapps opened this issue · 3 comments
Test case:
echo '(cf.
Foo)' | pandoc -f markdown -t markdown
Output: ( cf. Foo)
.
A space has been added after the open parenthesis. More precisely, if the native output format is chosen, we see it's a SoftBreak
: [Para [Str "(",SoftBreak,Str "cf.\160Foo)"]]
.
This is a sufficiently rare case that it only occurred once in a 350 page document.
This is actually a pretty common bug if you used hard line wrapping in your source document. It produces the error any time a line in your source document ends in an abbreviation prefixed by parenthesis:
Lorem (e.g.
ipsum)
produces output
Lorem ( e.g. ipsum)
I hard wrap at 78 characters in my source documents. On average for me, this produces ~3 errors per 8,000 words and of course it affects all output formats.
A possible workaround is to use --abbreviations=/dev/null
(or another empty file)
Here's the relevant code (in str
in the Markdown reader):
abbrevs <- getOption readerAbbreviations
if not (null result) && last result == '.' && result `Set.member` abbrevs
then try (do ils <- whitespace <|> endline
lookAhead alphaNum
return $ do
ils' <- ils
if ils' == B.space
then return (B.str result <> B.str "\160")
else -- linebreak or softbreak
return (ils' <> B.str result <> B.str "\160"))
<|> return (return (B.str result))
else return (return (B.str result)))
The logic is this: when an abbreviation is followed by a space, we replace it by a nonbreaking space. When it is followed by a line break (soft or hard), we replace it by a nonbreaking space and move the line break before the abbreviation. That gives bad results when the abbreviation isn't itself preceded by a space.