How should Grammar objects be composed?

Question

How should Grammar objects be composed?

KushalP opened this issue 5 years ago · 2 comments

The goal

I have a file that looks like this:

Header
Line
Line
Line
Footer

And I would like to parse it into the following container:

data class File(
  val header: Header,
  val lines: List<Line>,
  val footer: Footer
)

The problem

I have written a grammar for each of the different variants, i.e Grammar<Header, Grammar<Line>, and Grammar<Footer>. There is some similarity in how these different lines look. How can I compose these grammars together? The variants are split on a newline \n character.

Any help would be much appreciated.

Answer 1 · 2020-04-22T22:06:40.000Z

It is possible to compose grammars by composing their rootParsers and all of their tokens in another pair of a tokenizer and parser or in a grammar.

The main trick here is that you can override val tokens in a grammar to make it recognize any tokens you specify.

Note that if the grammars for header, line, and footer have any tokens that would be ambiguous together, you need to move those tokens out of the grammars and reuse them in the grammars.

Here's a simplified example of how I achieved this for two grammars. Composing three would be similar. :)

data class Header(val text: String)
data class Line(val words: List<String>)
data class Document(val header: Header, val lines: List<Line>)

val wordToken = token("\\w+\\b")
val ws = token("\\s+", ignore = true)
val sharedTokens = listOf(wordToken, ws)

object HeaderGrammar : Grammar<Header>() {
    private val tripleDash by literalToken("###")

    override val tokens: List<Token>
        get() = super.tokens + sharedTokens

    override val rootParser: Parser<Header> by (-tripleDash * wordToken).use { Header(text) }
}

object LineGrammar : Grammar<Line>() {
    override val tokens: List<Token>
        get() = super.tokens + sharedTokens

    override val rootParser: Parser<Line> by separatedTerms(wordToken, ws).use { Line(map { it.text }) }
}

object DocumentGrammar : Grammar<Document>() {
    private val NEWLINE by token("\n")

    override val tokens: List<Token>
        get() = (super.tokens + sharedTokens + HeaderGrammar.tokens + LineGrammar.tokens).distinct()

    override val rootParser: Parser<Document> by
    (HeaderGrammar.rootParser * -NEWLINE * (separatedTerms(LineGrammar.rootParser, NEWLINE)))
            .map { Document(it.t1, it.t2) }
}

class MyTestClass {
    @Test
    fun main() {
        val text = """
            ### Header
            x y
            y z
        """.trimIndent()

        println(DocumentGrammar.parseToEnd(text))
        // prints: Document(header=Header(text=Header), lines=[Line(words=[x, y]), Line(words=[y, z])])
    }
}

Answer 2 · 2020-04-23T15:20:25.000Z

Thanks for the explanation. This did the trick.