antlr/antlr4

left recursive rule references and operator precedence

MisterErwin opened this issue · 2 comments

Hey there,

when using left recursive grammars with operator precedence, I am unable to split my left recursive rule into multiple ones while maintaining the operator precedence.

For example:

grammar TestGrammar;

start : expr EOF;
expr : notexpr | expr '&&' expr | ID;
notexpr : '!' expr;

ID: [a-zA-Z_][a-zA-Z_0-9]* ;

An input string of !a && a results in a parse tree is the equivalent to !(a && a), ignoring the precedence. I would have expected a result such as (!a) && (a) instead.

The reason being, that operator precedence is only used with ordered alts and reset to 0 for references.
This is a snippet from the generated (Java) parser of this grammar:

	private ExprContext expr(int _p) throws RecognitionException {
		enterRecursionRule(_localctx, 2, RULE_expr, _p);
		try {
			int _alt;
			enterOuterAlt(_localctx, 1);
			{
			setState(12);
			_errHandler.sync(this);
			switch (_input.LA(1)) {
			case EXCL:
				{
				setState(10);
				notexpr();        // no precedence
				}
				break;
			case ID:

And here, the same part for a grammar where notexpressions are inlined/not hidden behind another production.

	private ExprContext expr(int _p) throws RecognitionException {
		enterRecursionRule(_localctx, 2, RULE_expr, _p);
		try {
			int _alt;
			enterOuterAlt(_localctx, 1);
			{
			setState(13);
			_errHandler.sync(this);
			switch (_input.LA(1)) {
			case EXCL:
				{
				setState(10);
				match(EXCL);
				setState(11);
				expr(3);     // precedence of 3
				}
				break;
			case ID:

The reason for hiding productions behind another is that we/I derive grammars from other grammars, resulting in ... larger grammars.
Due reusing productions (such as expressions) and the need to parse productions stand alone (e.g., a not expression), the generated java parsers are reaching the limits allowed in Java.

I spend some time to think about a possible solution in ANTLR:
My suggesting is to introduce a "delegated/indirect operator precedence" parameter to some methods,
which passes the operator precedence through (such as for the not expression in the example above).

I've only implemented it for Java right now to receive some feedback:
dev...MisterErwin:antlr4:ind-operator-prec

(It contains three smaller tests, but I also ran some larger tests with the root cause of this issue 😄 )

In case the change seems okay, I would go ahead and prepare a PR for all languages (and not just a prototype in Java).

Thanks
Alex

P.S.: @ kaby76, thanks for helping me with this over on stackoverflow - sorry for not being able to upvote the answers.

An input string of !a && a results in a parse tree is the equivalent to !(a && a), ignoring the precedence. I would have expected a result such as (!a) && (a) instead.

From my perspective it looks correct because notexpr has higher priority. You can try to use the following grammar:

expr : expr '&&' expr | notexpr | ID;

Thanks, but the change results in the same tree/result.
The original example:
Example on the original grammar
And with your grammar:
Example, but with the example by kvanTTT

But if I inline the rule (expr : '!' expr | expr '&&' expr | ID;), I receive the expected tree:
notexpr rule inlined result

And finally (to really increase the size of this comment), here is the resulting tree with the original grammar and the changes of my branch:
Original grammar with the proposed changes

€: The trees are generated using TestRig and I've added the fourth one showing the expected result.