scala/scala-parser-combinators

Potential bug with indirect left-recursion

LPeter1997 opened this issue · 2 comments

I've isolated a 3-rule pattern, where I'd expect the parser to succeed for a given input, but it fails instead. The sample grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleC ::= RuleB
        | RuleA
        .

The implementation:

import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical.StandardTokenParsers

object SimpleParser extends StandardTokenParsers with PackratParsers {
  lexical.delimiters ++= List(".", "$")

  lazy val ruleA: PackratParser[String] = ruleB <~ "$"
  lazy val ruleB: PackratParser[String] = (ident
                                       ||| ruleC ~> "." ~> ident)
  lazy val ruleC: PackratParser[String] = (ruleB
                                       ||| ruleA)

  def main(args: Array[String]) = {
    println(ruleA(new PackratReader(new lexical.Scanner("x.x$"))))
  }
}

It fails for input x.x$, telling me that it expects a $ instead of the ..

For me, this seems to be a problem with how indirect left-recursion is handled. I'm not sure if the original algorithm is incapable of handling this pattern, or this is an implementation bug.

Edit:
I've accidentally used | (first matching alt.) instead of ||| (longest matching alt.), I've fixed that in the code, but doesn't change the outcome.

Perhaps an interesting/helpful finding: Introducing an "alias" rule for RuleB seems to successfully parse x.x$. The grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleB' ::= IDENTIFIER
         | RuleC '.' IDENTIFIER
         .
RuleC ::= RuleB'
        | RuleA
        .

Implementation:

lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
                                     ||| ruleC ~> "." ~> ident)
lazy val ruleBvar: PackratParser[String] = (ident
                                        ||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleBvar
                                     ||| ruleA)

However, it still rejects a valid input, x.x.x$.
This makes me further believe that the problem is with indirect left-recursion.

I think the problem is the one identified in this paper: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/

That talks about an interaction with left- and right-hand recursion and identifies a problem with the Warth et al algorithm used in the PackratParsers.