mrkkrp/megaparsec

How to make a recoverable indentation block parser

Opened this issue · 1 comments

I'm writing an indentation parser and I want to recover from one certain type of error: indentation that is larger than reference indentation, but unequal to indentation set by the block parser.

Ex:

f ()
  x = 1
  y = 2
    z = 5
 pass
      n = 1
  return

As you can see, it's obvious that the statements are correct and belong to the same block. It's just that the indentation is all weird.

I want to report the error and continue parsing "block items" after this, including the incorrectly indented statement.
This would require me to catch the error some time during the execution of this function.

What would be the best approach to do that?

  • use indentedItems (which is not currently exported), get the indentation ref and call indentedItems after an error?
  • write a specific recoverableIndentBlock which handles this exact case? <-- I'll attempt this one

Okay, to answer the question of whether it's possible to rewrite indentBlock to be recoverable is: yes. The final implementation (specialized to my parser), but can be easily generalized is this:

recoverableIndentBlock ::
  Parser (L.IndentOpt Parser a b) ->
  Parser a
recoverableIndentBlock r = do
  scn
  ref <- L.indentLevel
  a <- r
  case a of
    L.IndentNone x -> x <$ scn
    L.IndentMany indent f p -> do
      mlvl <- (optional . try) (C.eol *> L.indentGuard scn GT ref)
      done <- isJust <$> optional eof
      case (mlvl, done) of
        (Just lvl, False) ->
          indentedItems ref (fromMaybe lvl indent) p >>= f
        _ -> scn *> f []
    L.IndentSome indent f p -> do
      pos <- C.eol *> L.indentGuard scn GT ref
      let lvl = fromMaybe pos indent
      x <-
        if
          | pos <= ref -> L.incorrectIndent GT ref pos
          | pos == lvl -> p
          | otherwise -> L.incorrectIndent EQ lvl pos
      xs <- indentedItems ref lvl p
      f (x : xs)

indentedItems ::
  -- | Reference indentation level
  Pos ->
  -- | Level of the first indented item ('lookAhead'ed)
  Pos ->
  -- | How to parse indented tokens
  Parser b ->
  Parser [b]
indentedItems ref lvl p = go
  where
    go = do
      scn
      pos <- L.indentLevel
      done <- isJust <$> optional eof
      if done
        then return []
        else
          if
            | pos <= ref -> return []
            | pos == lvl -> (:) <$> p <*> go
            | otherwise -> do
              o <- getOffset
              registerParseError $ FancyError o $ Set.singleton $ ErrorIndentation EQ lvl pos
              (:) <$> p <*> go

I still have a question though: is it the idiomatic way to do it? Maybe it would be nice to add something like this to the library?