Regular, Recursive, Restricted: Floyd!
Opened this issue · 0 comments
(I sent this as an email at first, but I'm not sure if it landed in your Spam folder; figured I'd try here)
So, there's a very tractable subset of the context-free grammars that's an exact match to your desires: Floyd grammars, AKA operator-precedence grammars.
They're parsed by either the Shunting-Yard algorithm (Donald Knuth, bottom-up) or Pratt parsing (Vaughan Pratt, top-down), and are closed under all the usual operations (more than CFG is!). They are also a deterministic class.
They have an operator precedence matrix, allowing them to represent a partial order of operator precedence, including operators that are incomparable and must be parenthesized; this is strictly more expressive than numeric precedence.
They do not have the nasty "undecidable emptiness" problems of CFG.
A good thesis that goes into depth about them, by Federica Panella: PDF
There are also generalizations of them to the world of infinite streams, where ω-grammars live.
EDIT: On re-reading the article after posting this, I realized that the approach you took has a strong resemblance to tree-adjoining grammars, which are a beyond-CFG formalism themselves. If you're ever interested in beyond-CFG formalisms with nice properties, I'd recommend two particular classes: the Boolean Grammars (most work done by Alexander Okhotin), and the Range Concatenation Grammars (initial work done by Pierre Boullier, subsequent work by Laura Kallmeyer et. al.)
The Boolean Grammars have the same computational complexity as the context-free grammars, but are closed under intersection and negation (...definitionally, as they're defined by CFG's closure under these operations). This allows for cool things like enforcing well-scoped variable usage in the grammar itself, as Okhotin demonstrates.
The Range Concatenation Grammars are more expressive - their computational power is exactly PTIME
. However, the actual computational cost of a particular grammar can be bounded by a simple syntactic check over the rules of the grammar. In addition, they are expressive enough to handle formalization of syntaxes like Rust's raw strings. As rules can have arities greater than one, they typically use a Prolog-like grammar syntax, rather than anything like EBNF. I have a comment from quite a while back on IRLO that gives some more context, and contains a grammar for raw strings.