pest-parser/pest

Impossible parse?

nitingupta910 opened this issue · 4 comments

I have a parse rule like:

dotted_alpha = { (ASCII_ALPHA | ".")* ~ ASCII_ALPHA }

to parse strings which can contain any mix of "." and ASCII_ALPHA but the last char must be an ASCII_ALPHA. However, I'm unable to parse this simple rule:

test code

    #[test]
    fn test_dotted_alpha() {
        let s = "a.fa.gs.ab";
        let dotted_alpha = match NTriplesParser::parse(Rule::dotted_alpha, s) {
            Ok(mut da) => da.next().unwrap().as_str(),
            Err(e) => {
                println!("====== Err parsing dotted_alpha: {}", e);
                return;
            }
        };
        println!("got dotted_alpha: {}", dotted_alpha);
        assert_eq!(dotted_alpha, s);
    }

output

====== Err parsing dotted_alpha:  --> 1:1
  |
1 | a.fa.gs.ab
  | ^---
  |
  = expected dotted_alpha

I got it working with:

dotted_alpha = { (ASCII_ALPHA* ~ ".")* ~ ASCII_ALPHA+ }

This seems unintuitive and would be difficult to come up with when trying to translate an EBNF style grammar to PEG.

PEGs are conceptually different from EBNFs. You can expect PEG to parser anything you could express in a CFG; you just have think in terms of PEG.

PEGs are designed to recognize languages instead of generating them, which is why they are popular in parsers.

CAD97 commented

Also, note that your PEG version is different from the BNF one: the BNF has no issue with multiple . in a row, whereas the PEG prohibits it. (edit: I read your grammar incorrectly this is incorrect.)

The literal translation fails because of how PEG works. PEG is greedy: (ASCII_ALPHA|".")* will consume every single (ASCII) alphabetic character and period, and then the trailing ~ ASCII_ALPHA won't have anything left to match.

Closing this for now. Feel free to reopen it if you have more questions.