jgm/pandoc

[Markdown Reader] Fails to parse citation immediately after an opening parenthesis

lierdakil opened this issue · 4 comments

This is prompted by lierdakil/pandoc-crossref#210

Long story short, I would expect pandoc to parse (@someCitation as

[Para [Str "("
,Cite
    [Citation {
       citationId = "someCitation"
     , citationPrefix = []
     , citationSuffix = []
     , citationMode = AuthorInText
     , citationNoteNum = 0
     , citationHash = 0
    }]
    [Str "@someCitation"]
  ]
]

This is not what actually happens though:
$ pandoc -f markdown -t native <<< '(@someCitation' yields

[Para [Str "(@someCitation"]]

I'm pretty sure the culprit is line 1414 here:

citeKey :: (Stream s m Char, HasLastStrPosition st)
=> ParserT s st m (Bool, String)
citeKey = try $ do
guard =<< notAfterString
suppress_author <- option False (True <$ char '-')
char '@'
firstChar <- alphaNum <|> char '_' <|> char '*' -- @* for wildcard in nocite
let regchar = satisfy (\c -> isAlphaNum c || c == '_')
let internal p = try $ p <* lookAhead regchar
rest <- many $ regchar <|> internal (oneOf ":.#$%&-+?<>~/") <|>
try (oneOf ":/" <* lookAhead (char '/'))
let key = firstChar:rest
return (suppress_author, key)

While on topic, this also becomes an issue at least in one other place where notAfterString is used, actually. Consider this (with smart extension):
pandoc -f markdown -t native <<< "('asd')"

[Para [Str "(\8217asd\8217)"]]

when I would expect something to the tune of

[Para [Str "(",Quoted SingleQuote [Str "asd"],Str ")"]]

instead.

jgm commented
jgm commented

OK, now I do remember. Here's what happens if that commit is reverted:

  Command:
    4635.md
      #1:                                                            FAIL (0.04s)
        test/Tests/Command.hs:54:
        
        ------------------------------------------------------------------------
        --- expected
        +++ pandoc -f markdown -t native
        +   1 [Para [Str "(",SoftBreak,Str "cf.\160foo)"]]
        -   1 [Para [Str "(cf.",SoftBreak,Str "foo)"]]
        ------------------------------------------------------------------------
      #2:                                                            FAIL (0.04s)
        test/Tests/Command.hs:54:
        
        ------------------------------------------------------------------------
        --- expected
        +++ pandoc -f markdown -t native
        +   1 [Para [Str "a",Space,Str "(",SoftBreak,Str "cf.\160foo)"]]
        -   1 [Para [Str "a",Space,Str "(cf.",SoftBreak,Str "foo)"]]
        ------------------------------------------------------------------------

So this has to do with #4635.

jgm commented

I agree that something should be done to fix this, but we have to avoid breaking #4635.

Perhaps adding a field recording the kind of last character parsed (i.e. symbol/alphanumeric) to the state and checking for that would suffice?