Potential bug with indirect left-recursion
LPeter1997 opened this issue · 2 comments
I've isolated a 3-rule pattern, where I'd expect the parser to succeed for a given input, but it fails instead. The sample grammar:
RuleA ::= RuleB.
RuleB ::= IDENTIFIER
| RuleC '.' IDENTIFIER
.
RuleC ::= RuleB
| RuleA
.
The implementation:
import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
object SimpleParser extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".", "$")
lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleB
||| ruleA)
def main(args: Array[String]) = {
println(ruleA(new PackratReader(new lexical.Scanner("x.x$"))))
}
}
It fails for input x.x$
, telling me that it expects a $
instead of the .
.
For me, this seems to be a problem with how indirect left-recursion is handled. I'm not sure if the original algorithm is incapable of handling this pattern, or this is an implementation bug.
Edit:
I've accidentally used |
(first matching alt.) instead of |||
(longest matching alt.), I've fixed that in the code, but doesn't change the outcome.
Perhaps an interesting/helpful finding: Introducing an "alias" rule for RuleB
seems to successfully parse x.x$
. The grammar:
RuleA ::= RuleB.
RuleB ::= IDENTIFIER
| RuleC '.' IDENTIFIER
.
RuleB' ::= IDENTIFIER
| RuleC '.' IDENTIFIER
.
RuleC ::= RuleB'
| RuleA
.
Implementation:
lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
||| ruleC ~> "." ~> ident)
lazy val ruleBvar: PackratParser[String] = (ident
||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleBvar
||| ruleA)
However, it still rejects a valid input, x.x.x$
.
This makes me further believe that the problem is with indirect left-recursion.
I think the problem is the one identified in this paper: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/
That talks about an interaction with left- and right-hand recursion and identifies a problem with the Warth et al algorithm used in the PackratParsers.