Better error messages
kbeldjilali opened this issue · 2 comments
Hi all,
I'm trying to have better error messages on arithmetic expression
I took a simple example from here : https://medium.com/synerise/yet-another-arithmetic-parser-in-scala-43dad055d81f
The grammar:
Expr -> Term Expr'
Expr' -> + Term Expr' | + Term Expr' | Ɛ
Term -> Factor Term'
Term' -> * Factor Term' | / Factor Term' | Ɛ
Factor -> number | ( Expr )
Implementation :
import cats.parse.Parser._
import cats.parse._
val mult: Parser[Unit] = Parser.char('*')
val div: Parser[Unit] = Parser.char('/')
val plus: Parser[Unit] = Parser.char('+')
val minus: Parser[Unit] = Parser.char('-')
val epsilon = (lhs: Double) => lhs
def binaryOp[A](op : (A, A) => A)( tuple: (A, A => A)) = {
val rhs = tuple._1
val next = tuple._2
(lhs: A) => next( op(lhs, rhs))
}
val number: Parser[Double] = Numbers.digits.map(_.toDouble)
def expr: Parser0[Double] = (term ~ exprP).map{ case (lhs, next) => next(lhs)}
def exprP : Parser0[Double => Double] = Parser.defer0{
(plus *> term ~ exprP).map{ binaryOp(_ + _) } |
(minus *> term ~ exprP).map{ binaryOp(_ - _)} |
Parser.pure(epsilon)
}
def term = (factor ~ termP).map{ case (lhs, next) => next(lhs) }
def termP: Parser0[Double => Double] = Parser.defer0 {
(mult *> factor ~ termP).map{ binaryOp(_ * _) } |
(div *> factor ~ termP).map{ binaryOp(_ / _) } |
Parser.pure(epsilon)
}
def factor: Parser0[Double] = Parser.defer0 {
number | expr.between(Parser.char('('), Parser.char(')'))
}
Given the input 1+2?3
, I have the following error : Left(Error(3,NonEmptyList(EndOfString(3,5))))
and I would like to have an error message which indicates to the user what is expected at offset 3, namely +|-|*|/
From what I understand, it's normal to have this kind of error because at offset 3, the parser exprP
fails on the first branch and second branch but succeeds on the last branch Parser.pure(epsilon)
so no errors are accumulated.
The parser has finished to parse what it can, but the string was not fully consumed, so we have and EndOfString
error because I used parseAll
.
Suppose that I replace the first Parser.pure(epsilon)
by Parser.pure(epsilon) <* Parser.end
, we have got much better errors Left(Error(3,NonEmptyList(InRange(3,+,+), InRange(3,-,-), EndOfString(3,5))))
(even if it's not exhaustive)
But it's wrong because I cannot write parenthesized expression like (1+2+3)
anymore.
Is this issue is related to how the grammar is described ?
I know there are other ways to describe this grammar, but I don't found one which gives me better error messages and respect operators precedence.
Yes think your pure(epislon)
is the problem.
I think you can deal with the parenthesized by parameterizing def termP(end: Parser[Unit]) = ...
and when you recurse inside of a parens you pass the char(')')
parser, but at the top level you pass Parser.end
or something like that.
It is a good solution, thank you.
Out of curiosity, if we want exhaustive error messages, we can have a grammar designed for error and rework the parsed result to respect semantic, isn't?
An alternative is to write 2 parsers, one designed for the error messages and the other for semantic.
What is the common approach ?