gallais/idris-tparsec

Bug in the string literal parser?

gallais opened this issue · 1 comments

While I was porting the JSON parser to Agda yesterday I think I have
found a bug in the string literal parser:

$ rand (char '\\')
$ box $ andoptbind anyTok $ \ c =>
if c /= into 'u' then fail else unicode

After having read a \ we will only accept a u followed by 4 hexadecimal
digits aka. a unicode character. Shouldn't we also accept:

  • another \
  • a "
  • a r, n, or t
    ?

Ok this is pretty sneaky: andoptbind anyTok $ \ c => ... means we'll consume
any token and succeed even if the continuation fails (in which case we return
(c, Nothing)). This means that we accept any character following a \, and
only have a special behaviour if it happens to be u aka the start of a unicode
character.

Not sure how I feel about that.