mrkkrp/megaparsec

Error context with additional source positions

Opened this issue · 4 comments

Hello, I'm trying to implement a error helper for unmatched delimiters, which will add the source location and quoted source from the suspected unmatched opening delimiter when the close delimiter parser fails.

This is what I have so far, which almost works, but it would be much better for the closed delimiter error to come first. Is there a way to quote some additional source given the offset in showErrorComponent?

parens :: Parser a -> Parser a
parens p = do
    d <- matchOpenDelimiter (void openParen) "("
    p <* d (void closeParen)

matchOpenDelimiter :: Parser a -> Text -> Parser (Parser b -> Parser ())
matchOpenDelimiter pOpenDelim openDelimLabel = do
    openDelimOffset <- getOffset
    void pOpenDelim
    pure $ \pCloseDelim -> do
        res <- observing $ void pCloseDelim
        case res of
            Left e -> do
                region (setErrorOffset openDelimOffset)
                    $ registerFancyFailure
                    (Set.fromList
                     [ErrorFail $ "context: possible unmatched "
                          <> T.unpack openDelimLabel])
                parseError e
            Right () -> pure ()

What it outputs when it fails:

( ( a ) 
1:1:
  |
1 | ( ( a ) 
  | ^
context: possible unmatched (

1:9:
  |
1 | ( ( a ) 
  |         ^
unexpected end of input
expecting ')'

Goal is to output something like this:

1:9:
  |
1 | ( ( a ) 
  |         ^
unexpected end of input
expecting ')'

( ( a ) 
1:1:
  |
1 | ( ( a ) 
  | ^
context: possible unmatched (

I think you'd be better served by a custom error which you could create via the region combinator. Here is a crude example:

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TupleSections #-}

module Foo where

import Data.List.NonEmpty (NonEmpty (..))
import Data.Set qualified as Set
import Data.Text (Text)
import Data.Void (Void)
import Text.Megaparsec
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer qualified as L

type Parser = Parsec CustomError Text

data CustomError = UnmatchedParen Int Int
  deriving (Eq, Ord)

instance ShowErrorComponent CustomError where
  showErrorComponent _ = "possibly unmatched ')'"
  errorComponentLen (UnmatchedParen start end) = end - start

lexeme :: Parser a -> Parser a
lexeme = L.lexeme space

parens :: Parser a -> Parser a
parens p = do
  o <- getOffset
  lexeme (string "(")
  region (reifyUnmatchedParen o) $ do
    p <* lexeme (string ")")

reifyUnmatchedParen ::
  Int ->
  ParseError Text CustomError ->
  ParseError Text CustomError
reifyUnmatchedParen startOffset = \case
  x@(TrivialError endOffset (Just EndOfInput) expected) ->
    if Tokens (')' :| []) `Set.member` expected
      then
        FancyError
          startOffset
          ( Set.singleton
              ( ErrorCustom
                  (UnmatchedParen startOffset endOffset)
              )
          )
      else x
  x -> x

myTest :: Parser Char
myTest = parens (parens (lexeme (char 'a')))

In action:

ghci> parseTest myTest "( (a)  "
1:1:
  |
1 | ( (a)
  | ^^^^^^^
possibly unmatched ')'

Of course refiyUnmatchedParen would need to be a little bit more intelligent and the UnmatchedParen constructor could as well be enhanced to preserve more details about the context, but the main idea would be this.

Yes, I'd like to use a custom error, this would be more morally correct than having two separate errors also. I think I failed to give a good enough example, here's a better one - a subquery in an SQL statement:

select *
from (select a,b
      from t
      where a = 1
            and b > a
2:6:
  |
2 | from (select a,b
  |      ^
context: possible unmatched (

5:22:
  |
5 |             and b > a
  |                      ^
unexpected end of input
expecting having, ), group by, or order by

This is the motivation for two separate bits of quoted source in the error message. I think this kind of error message is also good for reporting an unclosed block comment, or an unclosed multiline string.

Your suggestion as it stands only quotes from the start offset to the end of that first line - and it would not be ideal if it was fixed to quote all the lines between the first location and the last location. I think it's impossible for the parser to tell if the user would be better served seeing the suspect opening delimiter position, or the position where it fails to find the closing delimiter, which is why I want to show both.

I also originally tried something that worked the way you suggest in that it observes all the parsing between the open delimiter and the close delimiter, but this doesn't fit with my SQL parser code, which I don't think can easily be converted to this style, e.g. see this parser which parses a app-like suffix in SQL, which has many interesting variations: https://github.com/JakeWheat/simple-sql-parser/blob/c11bee4a9cc81e75fb8ce7d2f538ab5b0ab6615b/Language/SQL/SimpleSQL/Parse.hs#L956 . This was the motivation for attempting to return something from the wrapper of the open delimiter, which would then be applied to the close delimiter parser and not the whole intermediate parser. It may be that I'm merely not writing very good parser code in this case though.

The only thing I can think of here is to still introduce a custom error, but make it output only the position where ) might be missing (the end of the region in my example). Next, take errorBundlePretty as a starting point and make a custom version of it (possibly fixed for your particular type of parse errors) that would be able to quote from the input at arbitrary positions (this is not something that is readily available by default because the input stream is supposed to be consumed in a single pass as the error messages are rendered).

What I'd like to do is adapt the context example from the Megaparsec tutorial, and add offsets to the context entries. Then I can use a custom pretty printer no problem.

But I'm still using the work around of adding an extra error instead - this is because if mergeError sees a FancyError and a TrivialError, it throws the TrivialError away. I'd really like it to add both to a Set, like it does with two FancyErrors. This is so it can easily show an error with context, combined with all the gathered expecting items that you'd see without using any custom errors.