mrkkrp/megaparsec

Wrong source locations on `unexpected end of input` with custom tokens

sol opened this issue · 2 comments

sol commented

Given the example code at https://markkarpov.com/tutorial/megaparsec.html#working-with-custom-input-streams, if I modify exampleStream to

exampleStream :: MyStream
exampleStream = MyStream
  "5 + 6"
  [ at 1 1 (Int 5)
  , at 1 3 Plus         -- (1)
  ]
  where
    at  l c = WithPos (at' l c) (at' l (c + 1)) 2
    at' l c = SourcePos "" (mkPos l) (mkPos c)

then I get a wrong source location in the error message:

ghci> parseTest (pSum <* eof) exampleStream
1:1:
  |
1 | 5 +
  | ^
unexpected end of input
expecting integer

I haven't investigated any further, so not sure if it's an issue with the instance definitions from the tutorial or an issue with megaparsec itself. From what I tried, it seems to work fine with character based parsers.

mrkkrp commented

Thanks for spotting this! The problem is in the definition of the reachOffset method from TraversableStream MyStream. When after splitting the stream there are tokens following the location of the error it should not default to the previous position (here, 1:1), but perhaps it should instead assume the position of the end of the span of the last consumed token, e.g.:

@@ -20,7 +20,9 @@ instance TraversableStream MyStream where
       sameLine = sourceLine newSourcePos == sourceLine pstateSourcePos
       newSourcePos =
         case post of
-          [] -> pstateSourcePos
+          [] -> case unMyStream pstateInput of
+            [] -> pstateSourcePos
+            xs -> endPos (last xs)
           (x:_) -> startPos x
       (pre, post) = splitAt (o - pstateOffset) (unMyStream pstateInput)
       (preStr, postStr) = splitAt tokensConsumed (myStreamInput pstateInput)

Then, assuming you also change the "original user input" in exampleStream to match the tokens:

exampleStream :: MyStream
exampleStream = MyStream
  "5 +"
  [ at 1 1 (Int 5)
  , at 1 3 Plus         -- (1)
  ]
  where
    at  l c = WithPos (at' l c) (at' l (c + 1)) 2
    at' l c = SourcePos "" (mkPos l) (mkPos c)

...it seems to work:

ghci> parseTest (pSum <* eof) exampleStream
1:4:
  |
1 | 5 +
  |    ^
unexpected end of input
expecting integer

I'm going to push a fix for that tutorial.

mrkkrp commented

This is now fixed.