How should Grammar objects be composed?
KushalP opened this issue · 2 comments
The goal
I have a file that looks like this:
Header
Line
Line
Line
Footer
And I would like to parse it into the following container:
data class File(
val header: Header,
val lines: List<Line>,
val footer: Footer
)
The problem
I have written a grammar for each of the different variants, i.e Grammar<Header
, Grammar<Line>
, and Grammar<Footer>
. There is some similarity in how these different lines look. How can I compose these grammars together? The variants are split on a newline \n
character.
Any help would be much appreciated.
It is possible to compose grammars by composing their rootParser
s and all of their tokens in another pair of a tokenizer and parser or in a grammar.
The main trick here is that you can override val tokens
in a grammar to make it recognize any tokens you specify.
Note that if the grammars for header, line, and footer have any tokens that would be ambiguous together, you need to move those tokens out of the grammars and reuse them in the grammars.
Here's a simplified example of how I achieved this for two grammars. Composing three would be similar. :)
data class Header(val text: String)
data class Line(val words: List<String>)
data class Document(val header: Header, val lines: List<Line>)
val wordToken = token("\\w+\\b")
val ws = token("\\s+", ignore = true)
val sharedTokens = listOf(wordToken, ws)
object HeaderGrammar : Grammar<Header>() {
private val tripleDash by literalToken("###")
override val tokens: List<Token>
get() = super.tokens + sharedTokens
override val rootParser: Parser<Header> by (-tripleDash * wordToken).use { Header(text) }
}
object LineGrammar : Grammar<Line>() {
override val tokens: List<Token>
get() = super.tokens + sharedTokens
override val rootParser: Parser<Line> by separatedTerms(wordToken, ws).use { Line(map { it.text }) }
}
object DocumentGrammar : Grammar<Document>() {
private val NEWLINE by token("\n")
override val tokens: List<Token>
get() = (super.tokens + sharedTokens + HeaderGrammar.tokens + LineGrammar.tokens).distinct()
override val rootParser: Parser<Document> by
(HeaderGrammar.rootParser * -NEWLINE * (separatedTerms(LineGrammar.rootParser, NEWLINE)))
.map { Document(it.t1, it.t2) }
}
class MyTestClass {
@Test
fun main() {
val text = """
### Header
x y
y z
""".trimIndent()
println(DocumentGrammar.parseToEnd(text))
// prints: Document(header=Header(text=Header), lines=[Line(words=[x, y]), Line(words=[y, z])])
}
}
Thanks for the explanation. This did the trick.