SchrodingerZhu/paguroidea

Simplify grammar

QuarticCat opened this issue · 2 comments

Here are some possible ergonomic improvements.

  1. Set active as default so that users only need to mark silent rules. And for silent rules, we can use some special character or naming style to make them clean.
  2. Combine lexer definitions and tokens. (implemented in #56)
    1. To ensure all tokens are terminal, we only need to check if the reference graph is a DAG, and then inline all rules.
    2. To avoid generating extra lexers, we can delay the generation of lexers after the generation of parsers, and inline & generate lexers by need.
  3. Combine parser definitions and fixpoints. (implemented in #47)
    1. We may automatically infer fixpoints. A possible algorithm is to find cycles in the reference graph and then mark all vertices in cycles as fixpoints.
  4. Remove ~ (sequence operator). Instead of writing e1 ~ e2, we can simply write e1 e2.
  5. Ad-hoc lexical rule. For example, "(" ~ sexprs ~ ")".

We are thinking of extending our system such that not only trees, but arbitrary data types are supported as parser output as well.

However, this brings difficulties to apply TCO. Thus, it is still not clear to me how should the design go.

It seems to me that we can separate rules into two parts (not counting offset and src):

  • A negative rule that accepts a &mut Consumer and returns Result<(), Error>. (This can be tail-call optimised.) (The &mut Consumer, for example, can be a &mut Vec<T>).
  • A positive rule that accepts nothing and returns Result<T, Error>.

However, it is not clear that what will happen when we need to expand actions. We also need to figure out a way to really specify such rule properly.