Unparsable grammar
chichid opened this issue · 6 comments
I think either this use case is broken or I understand the framework wrong.
The following case is a simplification of a grammar I'm writing. Can you please let me know if this an issue?
Grammar Definition:
A sequence is considered valid if:
1 - it has no "#"
2 - It starts with r# and then a valid sequence
examples: abc is valid, r#abc is valid, but a#bc is not valid
Proof
In the bellow example, r#abc should be parsed as valid since:
a(r#abc) = valid(r#abc) / ("r#" valid(abc)) <---- This group doesn't seem to "eat" the "r#" (is it expected?)
= ERR / (OK()) <--- It's also possible that my understand of this "/" is wrong (is this what's expected?)
= OK()
Grammar / Code
peg::parser!{
grammar deep_parser() for str {
pub rule a() = valid() / ("r#" valid())
rule valid() = f()*
rule f() -> char = [ch] {?
if ch != '#' {
Ok(ch)
} else {
Err("# is forbidden")
}
}
}
}
fn main() {
assert!(deep_parser::a("#####").is_err());
assert!(deep_parser::a("abc").is_ok());
assert!(deep_parser::a("r#abc").is_ok()); // <----------- This panics, and it shouldn't!
}
The order of operands of the /
operator is important. Just change rule a
to
pub rule a() = "r#" valid() / valid()
or even
pub rule a() = "r#"? valid()
@Mingun yeah I figured that, but I still don't fully understand why the general rule:
A / B is not equivalent to B / A
Based on the docs the operator / should evaluate A and if it fails it should evaluate B
Why is that different from: Evaluate B then stop since it's successful.
A / B is not equivalent to B / A
If A
and B
are fully independent then both variants are equivalent, but in you case B = C A
(where A=valid
, C="r#"
).
r#abc
is a valid()
+ "#abc"
, but grammar should consume the whole input, and in that case #abc
stay unconsumed which is lead to error
I'm going to close this issue to avoid being heavy :), but I still don't get it.
@Mingun I think it should still work, because "#abc" will be unconsumed in one branch ("r#" valid()) should act as fallback.
a(r#abc) =
= (valid(r#abc)) / ("r#" valid(abc))
= (valid() + #abc) / ("r#" valid(abc))
= ERR / ("r#" valid(abc))
= ERR / OK()
= OK()
I think, you should learn much more about PEG formalism. Unlike other grammar languages, PEG never backtrack if rule matched. In your example, when rule a()
match the r
part of the r#abc
input using valid()
variant, it will not backtrack to the ("r#" valid())
variant, because rule valid()
is matched successfully and input was consumed. This mean that a()
rule is also matched and the whole grammar consume only r
. Unconsumed #abc
then makes parser to return an error
Yeah I agree, I'm new to PEG so that's not very intuitive to me. But this answers the question thanks!