How to report "lexical" errors?
Opened this issue · 3 comments
I'm building a parser that accepts custom token stream.
I've made TokenStream (from lexer-applicative
) an instance of Stream
instance Stream (TokenStream (L tok)) where
And that's wonderful, everything worked as expected, until a "lexcial error" appear in my token stream
-- | A stream of tokens
data TokenStream tok
= TsToken tok (TokenStream tok)
| TsEof
| TsError LexicalError
The parser complained about unexpected end of input
, that's because I had no choice but had to treat TsError
like TsEof
.
I think there are 3 ways of solving this:
- Make
Stream
"aware" of these lexical errors: for example, lettake1_
return aEither
value instead of just aMaybe
value. - Make the parser incremental: so that users can check if the next token is
TsError
, before feeding it to the parser. - The "happy" way, something between 1. and 2.
I'll explain more about how it can be done in happy
:
Happy also allows user to choose their own type token stream (usually with alex
). As long as we tell happy
what is the token for eof
:
%lexer { <lexer> } { <eof> }
and what to do when a token comes in:
lexer :: (Token -> P a) -> P a
For example, this is how to deal with a token stream from lexer-applicative
:
lexer :: (Token -> P a) -> P a
lexer f = scanNext >>= f
scanNext :: P Token
scanNext = do
stream <- gets tokenStream
case stream of
TsToken (L _ tok) stream -> return tok
TsEof -> return TokenEOF
TsError (LexicalError pos) -> throwError $ Lexical pos
I think this is the best among the 3 solutions, because it allows users to handle lexical errors the way they like, and it's not an overkill like making megaparsec
incremental.
But I'm still not sure about how to incorporate this into the Stream
class, if we are going to do this.
Should a token stream with an error in it be fed into a parser? You could just report the error because parsing won't succeed anyway.
Ideally you would not know if there's an error in a token stream, until you keep extracting from the stream and finally encounter one.
The workaround I'm using now is to force the whole stream into a list, and see if there's any error.
I don't know if you still need this, but another workaround is to have type Token s = Either String tok
then throw a parser error whenever you get a Left
. It'll unfortunately mean you'll end up with expected tokens that are always Right _
, so you could use a label for that instead.