Back-references and AST Nodes

Question

Back-references and AST Nodes

Closed this issue 3 years ago · 2 comments

If I am understanding back-references correctly, the general idea is that with a parser like the following,

const parser = peg`
add: <a>$addend '+' >a< ${({a}) => 2 * Number(a)}

$addend @raw: [0-9]+
`

The value of an input ran through this will either be twice whatever $addend resolves to, or a failure, as the back-reference cannot correctly match whatever the first value is.

Imagine a scenario where an AST is being constructed, where given non-terminals in the grammar generate nodes corresponding to matched text. Is is possible, with back-references, to exactly match the same AST found earlier in a production rule?

For example:

const parser2 = peg`
add: <a>multiply '+' >a< => 'ADD'

multiply: <a>$factor '*' <b>$factor => 'MULTIPLY'

$factor @raw: [0-9]+
`

The idea being that multiplications could have any arbitrary numbers multiplied together, forming a Node, which would then be exactly matched twice by the addition. So,

parser2.test('2*3 + 2*3') // pass
parser2.test('2*3 + 3*4') // fail

As far as I can tell, (as of 0.5.5) the back-reference in parser2 will fail, asserting that it is expecting an object. Is there enough context present at the point a back-reference might be used to attempt to reparse an exact match? Is this impossible to represent in PEGs, or grammars more broadly?

Thanks for your consideration.

Answer 1 · 2021-11-13T02:37:24.000Z

That's an interesting situation. The thing is that once the left hand side of + is parsed, the only context you have of it is a Node, which can be arbitrarily defined with fields unrelated to your captures, or even unrelated to the parsing for that matter. So there is no way Pegase can tell automatically from which sub-matches the Node was built from, therefor it can't enforce anything for the right hand side of +.

Theoretically it might be possible to extend back references to nodes, but the nodes would lose a lot of flexibility as Pegase would need more predictability about the fields. That's not what abstract syntax trees are for. The abstraction must be user-defined.

Your case can however simply be solved by capturing the two operand's ASTs (or reading $children) and comparing them as you see fit in a semantic action.

Does that answer the question?

PS: By the way, that increases my sentiment that nodes and ASTs shouldn't be part of the core Pegase library, but rather distributed as an addon package with plugin and helper functions, for example in @pegase/node. By being entangled with the core package, it gives off the vibe that there is something more to nodes than just custom objects emitted like any other value. In fact it's nothing more. Pegase is not able to derive any deeper context from them.

Answer 2 · 2021-11-13T03:17:06.000Z

Fair and understandable. I've developed an alternative approach for the use case abstracted above via a visitor, so the need for Nodes in back-references isn't necessary. Thanks for the sanity check.