m4rw3r/chomp

`or` with many branches

Closed this issue · 5 comments

I need a parser with multiple branches. My first attempt was:

or(i, parser! { string(b"ERROR"); ret LogLevel::Error }, |i| {
    or(i, parser! { string(b"WARN"); ret LogLevel::Warn }, |i| {
        or(i,
           parser! { string(b"INFO"); ret LogLevel::Info },
           parser! { string(b"DEBUG"); ret LogLevel::Debug })
    })
})

It works, but it feels too verbose, so I tried to write this macro:

macro_rules! alt {
    ($i:expr, $a:expr) => { $a };
    ($i:expr, $a:expr, $b:expr) => { or($i, $a, $b) };
    ($i:expr, $a:expr, $($b:expr),*) => { or($i, $a, |i| alt!(i, $($b),*)) };
}

Now, the parser looks much better:

alt!(i,
    { parser! { string(b"ERROR"); ret LogLevel::Error } },
    { parser! { string(b"WARN");  ret LogLevel::Warn } },
    { parser! { string(b"INFO");  ret LogLevel::Info } },
    { parser! { string(b"DEBUG"); ret LogLevel::Debug } }
)

I have some questions:

  • Is this a good solution? Is there any alternative?
  • Can you include something like this in the crate? I think that more people will need a multi-or combinator.

Thanks!

I have been thinking about how to solve this for a while. Attoparsec and Parsec uses the Alternative typeclass to allow multiple alternations by using the <|> infix operator. I decided to hold off on implementing any macro like alt! to explore other options, as well as to see how well stacking the or combinator scales.

An infix combinator (or method) would be nice but is not possible due to the eager evaluation of the parsers in Chomp and the fact that it does not backtrack on errors (like Parsec does). The or combinator stores the position before evaluating the left hand side, and uses the stored position to evaluate the right hand side if the left hand side fails. With a lazy evaluation of the Parser monad we could possibly use eg. + or | for this, or an or method on ParseResult, but that requires Abstract output type parameters to land.

So, to list a few possible solutions I have thought of:

Tuple

Making an alt combinator function taking a tuple as an argument. This may have some merit but as in #26 this also has the issue of preventing type-inference of Input to the closures used in the tuples (so the parser! macro needs to be augmented with types and possibly lifetimes, which can be a huge issue).

Macro

It is essentially the macro you have provided above. The issue with macros is that they do not integrate nicely with the parse! macro:

parse!{i;
    // This works nicely:
    let a = or(decimal, hexadecimal);
    let b = alt!(this, will, fail, badly);
    let c = i -> alt!(i, this, will, work);
    ret (a, b, c)
};

With the inline action form it might be acceptable, and it is useful outside of the parse! macro.

Improve parse!

Improving parse! to support more syntax (ie. the Alternative typeclass infix operator <|>) might be another option:

parse!{i;
    let a = decimal() <|> hexadecimal(); // can use <|> due to ACTION being `$expr (...)`
    let b = this() <|> will() <|> work();
    ret (a, b)
}

It will not work with the INLINE_ACTION type (ie. i -> $expr) sine it ends in an expression which require , or ; to end. Could maybe be solved with parentheses. It is also very similar to the alt! macro when only used as an alteration one-liner:

let level = |i, b, r| string(i, b).map(|_| r);
parse!{i; level(b"ERROR", Error)   <|>
          level(b"WARN",  Warning) <|>
          level(b"INFO",  Info)    <|>
          level(b"DEBUG", Debug)}

Slice

Slice would be nice but requires indirection (ie. trait objects), so this is out.

I am currently leaning towards improving the parse! macro. The basic idea is to extend and generalize the grammar:

Block     ::= Statement* Expr
Statement ::= (Bind | Expr) ';'
Bind      ::= 'let' Var '=' Expr
Var       ::= $ident ':' $ty | $pat
Expr      ::= Term | Expr Operator Term
Term      ::= Ret | Err | Inline | Named | '(' Expr ')'
Ret       ::= "ret" (Typed | $expr)
Err       ::= "err" (Typed | $expr)
Typed     ::= '@' $ty ',' $ty ':' $expr
Inline    ::= $ident "->" $expr
Named     ::= $ident '(' ($expr ',')* (',')* ')'
Operator  ::= "<|>" | "<*" | ">>"

This should allow for a lot more flexibility in the use of the parse! macro:

// alternation:
parse!{i; a() <|> b() <|> c()}
// skipping:
parse!{i; decimal() <* token(b';')}
// inline ret and err
parse!{i; (ret 2) <|> err "will never happen"}

And most of this will be usable inline. If I can integrate this further with the #26 it should make the parse! macro very powerful.

+1 on this, thanks @m4rw3r for the pros and cons of each approach.

I really like the <|> solution (-:

I have now managed to implement the following grammar with right-associative operators in the Expr without any operator precedence in #31