BNFC/bnfc

Parse error should close layout block

andreasabel opened this issue · 0 comments

BNFC's layout handling does not implement the following clause, taken from the Haskell 98 report:

A close brace is also inserted whenever the syntactic category containing the layout list ends; that is, if an illegal lexeme is encountered at a point where a close brace would be legal, a close brace is inserted.

Consider this small (artificial) expression grammar with a sum construct that can use layout.

ETimes.   Exp   ::= Exp "*" Exp1;
ESum.     Exp1  ::= "sum" "{" [Exp] "}";
EInt.     Exp1  ::= Integer;

_.        Exp   ::= Exp1;
_.        Exp1  ::= "begin" Exp "end";

separator Exp ";";

layout "sum";

As BNFC has a workaround for parentheses "(...)", we use "begin ... end" here instead.

This grammar handles e.g. sum { 1; 2; sum { 3;4;5 } * 6 } * 7. It fails on:

begin sum
  begin 1 end
  2 end * 3

The correct reconstruction of the block would be:

begin sum
  { begin 1 end
  ; 2 } end * 3

However, he token stream generated by the layout pass does not respect the bracketing begin ... end:

1:01	"begin"
1:07	"sum"
1:11	"{"
2:03	"begin"
2:09	"1"
2:11	"end"
2:15	";"
3:03	"2"
3:05	"end"
3:09	"*"
3:11	"3"
3:13	"}"

This is because the closing brace } is inserted mechanically according to the off-side rule, yet it should be inserted more dynamically by the parser to fix the parse error generated by the end token in line 3. Basically, the closing bracket is not inserted by dedentation but also by parse errors.

A layout stop "end" instruction does not help here, as we then close the layout block too early, before the first end, rather than the second end:

1:01	"begin"
1:07	"sum"
1:11	"{"
2:03	"begin"
2:09	"1"
2:11	"}"
2:11	"end"
3:03	"2"
3:05	"end"
3:09	"*"
3:11	"3"