Error recovery when using `precedence!`
cylixlee opened this issue · 2 comments
Hi kevin I'm using peg to generate an arithmetic calculator with quote supporting. This calculator takes several calculation statements: a valid expression followed by a single semicolon ";".
I want to add a simple error recovery for this calculator: just skip everything before the expression boundary. For example:
1+2;
is a valid expression statement and is parsed asAdd(Number(1), Number(2))
.1+error;
is not valid because of the unrecognized right-hand-side expression of+
, and is parsed asAdd(Number(1), Error)
.
In the cases above, I can just set the boundary of expression boundary to semicolon ;
to skip everything except semicolon, and continue to parse next statements.
Of cource, there's a case that's not very natural:
error + 1;
will produceError
expression instead ofAdd(Error, Number(1))
, but that's ok with me.
The problem appears when it comes to grouping expressions (parenthesis quoted expressions). Naturally, I wrote code like this:
#[derive(Debug)]
enum Expression {
Number(f64),
Add(Box<Expression>, Box<Expression>),
Subtract(Box<Expression>, Box<Expression>),
Multiply(Box<Expression>, Box<Expression>),
Divide(Box<Expression>, Box<Expression>),
// Special
Error,
}
peg::parser!(grammar pegparser() for str {
use std::str::FromStr;
pub rule statements() -> Vec<Expression>
= _ es:expression_statement()* _ { es }
rule expression_statement() -> Expression
= e:expression(';') _ ";" _ { e }
rule expression(boundary: char) -> Expression = precedence! {
x:(@) _ "+" _ y:@ { Expression::Add(Box::new(x), Box::new(y)) }
x:(@) _ "-" _ y:@ { Expression::Subtract(Box::new(x), Box::new(y)) }
--
x:(@) _ "*" _ y:@ { Expression::Multiply(Box::new(x), Box::new(y)) }
x:(@) _ "/" _ y:@ { Expression::Divide(Box::new(x), Box::new(y)) }
--
n:number() { n }
"(" _ e:expression(')') _ ")" { e }
[^boundary]+ { Expression::Error } /* here if I change [^boundary] to [^';'], it goes ok. */
}
rule _ = blank()*
rule blank()
= [' '|'\t'|'\r'|'\n']
rule number() -> Expression
= s:$(['0'..='9']+ ("." ['0'..='9']+)?) {
match f64::from_str(s) {
Ok(number) => Expression::Number(number),
Err(e) => {
eprintln!("{}", e);
Expression::Error
}
}
}
});
fn main() {
println!("{:?}", pegparser::statements("1 + (error) + 2;"));
}
I'm expecting it to produce Add(Add(Number(1), Error), Number(2))
, or at least Add(Number(1), Error)
. However it just fails and returns a Err(ParseError)
:
ParseError {
location: LineCol { line: 1, column: 6, offset: 5 },
expected: ExpectedSet {
expected: {
"\"(\"",
"[' '|'\\t'|'\\r'|'\\n']",
"['0'..='9']",
"[^boundary]"
}
}
}
When I change the line I marked with comment, the result turns out ok: Add(Number(1), Error)
. That's weird because the boundary is a char
and should be acceptible in patterns and it just doesn't work. It can't even parse expressions without parentheses like 1 + error;
.
I wonder if my code is wrong or not and is there any better solutions.
Necessity
Since I want to use peg in a programming language parser, I can't just set the ';' as the expression boundary and skip everything.
Take this pseudo-code snippet as example:if (1 + error) {}
I want to produce something like
IfStmt { condition: AddExpr(Number(1), Error) }
instead of a roughError
expression.
You're expecting [^boundary]
to match any character other than the one passed in as an argument, but because of how PEG [ ]
expands to a Rust pattern in an arm of a Rust match
expression, that actually never matches anything.
[x]
expands to a match arm with pattern x
, and ^
flips the accepting and rejecting arms of the match. So [^boundary]
expands to something like
match next_char {
boundary => reject()
_ => accept()
}
An identifier like boundary
as a Rust pattern matches anything and captures it into a new variable, which in this case is ignored. That variable shadows the argument boundary
variable, rather than comparing the character to it.
A variable with ^
isn't very useful because it leads to the rejection arm where you can't use the variable. It's most useful in cases with custom token types, where you can do [MyTokenEnum::Ident(x)]
and then use the captured x
in a subsequent block.
Instead of [^boundary]
, try [c if c != boundary]
, which expands like
match next_char {
c if c != boundary => accept()
_ => reject()
}
Ah, that's very clear to me now! Thanks kevin.