kevinmehall/rust-peg

Unparsable grammar

chichid opened this issue · 6 comments

I think either this use case is broken or I understand the framework wrong.
The following case is a simplification of a grammar I'm writing. Can you please let me know if this an issue?

Grammar Definition:
A sequence is considered valid if:
1 - it has no "#"
2 - It starts with r# and then a valid sequence
examples: abc is valid, r#abc is valid, but a#bc is not valid

Proof
In the bellow example, r#abc should be parsed as valid since:
a(r#abc) = valid(r#abc) / ("r#" valid(abc)) <---- This group doesn't seem to "eat" the "r#" (is it expected?)
= ERR / (OK()) <--- It's also possible that my understand of this "/" is wrong (is this what's expected?)
= OK()

Grammar / Code

peg::parser!{
    grammar deep_parser() for str {
        pub rule a() = valid() / ("r#" valid())
        
        rule valid() = f()* 

        rule f() -> char = [ch] {?
            if ch != '#' {
                Ok(ch)
            } else {
                Err("# is forbidden")
            }
        }
    }
}

fn main() {
    assert!(deep_parser::a("#####").is_err());
    assert!(deep_parser::a("abc").is_ok());
    assert!(deep_parser::a("r#abc").is_ok()); // <----------- This panics, and it shouldn't!  
}

The order of operands of the / operator is important. Just change rule a to

pub rule a() = "r#" valid() / valid()

or even

pub rule a() = "r#"? valid()

@Mingun yeah I figured that, but I still don't fully understand why the general rule:
A / B is not equivalent to B / A

Based on the docs the operator / should evaluate A and if it fails it should evaluate B
Why is that different from: Evaluate B then stop since it's successful.

A / B is not equivalent to B / A

If A and B are fully independent then both variants are equivalent, but in you case B = C A (where A=valid, C="r#").

r#abc is a valid() + "#abc", but grammar should consume the whole input, and in that case #abc stay unconsumed which is lead to error

I'm going to close this issue to avoid being heavy :), but I still don't get it.

@Mingun I think it should still work, because "#abc" will be unconsumed in one branch ("r#" valid()) should act as fallback.

a(r#abc) =
= (valid(r#abc)) / ("r#" valid(abc)) 
= (valid() + #abc) / ("r#" valid(abc)) 
= ERR / ("r#" valid(abc)) 
= ERR / OK()
= OK()

I think, you should learn much more about PEG formalism. Unlike other grammar languages, PEG never backtrack if rule matched. In your example, when rule a() match the r part of the r#abc input using valid() variant, it will not backtrack to the ("r#" valid()) variant, because rule valid() is matched successfully and input was consumed. This mean that a() rule is also matched and the whole grammar consume only r. Unconsumed #abc then makes parser to return an error

Yeah I agree, I'm new to PEG so that's not very intuitive to me. But this answers the question thanks!