Unparsable grammar

Question

Unparsable grammar

chichid opened this issue 2 years ago · 6 comments

I think either this use case is broken or I understand the framework wrong.
The following case is a simplification of a grammar I'm writing. Can you please let me know if this an issue?

Grammar Definition:
A sequence is considered valid if:
1 - it has no "#"
2 - It starts with r# and then a valid sequence
examples: abc is valid, r#abc is valid, but a#bc is not valid

Proof
In the bellow example, r#abc should be parsed as valid since:
a(r#abc) = valid(r#abc) / ("r#" valid(abc)) <---- This group doesn't seem to "eat" the "r#" (is it expected?)
= ERR / (OK()) <--- It's also possible that my understand of this "/" is wrong (is this what's expected?)
= OK()

Grammar / Code

peg::parser!{
    grammar deep_parser() for str {
        pub rule a() = valid() / ("r#" valid())
        
        rule valid() = f()* 

        rule f() -> char = [ch] {?
            if ch != '#' {
                Ok(ch)
            } else {
                Err("# is forbidden")
            }
        }
    }
}

fn main() {
    assert!(deep_parser::a("#####").is_err());
    assert!(deep_parser::a("abc").is_ok());
    assert!(deep_parser::a("r#abc").is_ok()); // <----------- This panics, and it shouldn't!  
}

Answer 1 · 2022-06-28T03:57:54.000Z

The order of operands of the / operator is important. Just change rule a to

pub rule a() = "r#" valid() / valid()

or even

pub rule a() = "r#"? valid()

Answer 2 · 2022-06-28T04:13:41.000Z

@Mingun yeah I figured that, but I still don't fully understand why the general rule:
A / B is not equivalent to B / A

Based on the docs the operator / should evaluate A and if it fails it should evaluate B
Why is that different from: Evaluate B then stop since it's successful.

Answer 3 · 2022-06-28T07:43:35.000Z

A / B is not equivalent to B / A

If A and B are fully independent then both variants are equivalent, but in you case B = C A (where A=valid, C="r#").

r#abc is a valid() + "#abc", but grammar should consume the whole input, and in that case #abc stay unconsumed which is lead to error

Answer 4 · 2022-06-28T15:37:43.000Z

I'm going to close this issue to avoid being heavy :), but I still don't get it.

@Mingun I think it should still work, because "#abc" will be unconsumed in one branch ("r#" valid()) should act as fallback.

a(r#abc) =
= (valid(r#abc)) / ("r#" valid(abc)) 
= (valid() + #abc) / ("r#" valid(abc)) 
= ERR / ("r#" valid(abc)) 
= ERR / OK()
= OK()

Answer 5 · 2022-06-28T16:36:31.000Z

I think, you should learn much more about PEG formalism. Unlike other grammar languages, PEG never backtrack if rule matched. In your example, when rule a() match the r part of the r#abc input using valid() variant, it will not backtrack to the ("r#" valid()) variant, because rule valid() is matched successfully and input was consumed. This mean that a() rule is also matched and the whole grammar consume only r. Unconsumed #abc then makes parser to return an error

Answer 6 · 2022-06-28T16:47:43.000Z

Yeah I agree, I'm new to PEG so that's not very intuitive to me. But this answers the question thanks!