Better document the grammar of the test format
gsnedders opened this issue · 1 comments
gsnedders commented
We sorta document it as a tree-construction specific format. We should give the generic definition most people use.
It's something like the state machine below:
HEADER: "#([^\n]+)"
BODY: "([^#][^\n]*)"
LF: "\n"
start:
HEADER -> after_header
after_header:
LF -> body
EOF
body:
LF -> after_lf
BODY -> body
EOF
after_lf:
LF -> after_lf_lf
BODY -> body
EOF
after_lf_lf:
LF -> after_lf_lf {add "\n" to body}
BODY -> body {add "\n\n" to body}
HEADER -> after_header
EOF {add "\n" to body}
This shouldn't be much effort to convert into an LR(2) grammar; the LR(1) grammar equivalent may be hell.
gsnedders commented
Basically what we want, for the terminals above, is:
test = HEADER LF (BODY | LF)*
tests = test (LF LF test)* LF
Which is LR(2).