mrkkrp/megaparsec

Newlines after tokens in `indentBlock`

Lev135 opened this issue · 3 comments

There is such note about them in documentation:

Tokens must not consume newlines after them. On the other hand, the first argument of this function must consume newlines among other white space characters.

However, this code

parser = L.indentBlock MP.space $ do
  h <- MP.string "head"
  pure $ L.IndentMany Nothing (\els -> pure (h, els)) (MP.string "item")
main = MP.parseTest parser "head\n item item"

produce such error message:

2:7:
  |
2 |  item item
  |       ^
incorrect indentation (got 7, should be equal to 2)

which doesn't seems good for me.

After replacing item parser with MP.string "item" <* MP.eol, error message becomes much better:

2:6:
  |
2 |  item item
  |      ^^
unexpected " i"
expecting end of line

As far as I can see, when we are parsing indented blocks of code we never want next item to be on the same line as previous (it will always fail while checking indentation) and shoud use something like lineFold if we want this behaviour.

So I wonder, why is it written in docs not to consume eol after item? Are there any drawbacks in such approach?

Indeed, the error message could be better (even though it is technically correct). As for why a token should not consume the newline, the only edge case I can think of is when there is no newline at the end of the input. In general, this should parse, but if you demand a newline unconditionally then the last item will fail to parse in that case.

the only edge case I can think of is when there is no newline at the end of the input

Yes, I use (MP.eol <|> MP.eof) in this place. If there aren't other problems, maybe it's reasonable to add this usage as default in documentation?

Adjusted the docs in 7a4e3b3. The old phrasing was, indeed, too restrictive for no good reason. Whether or not to use (MP.eol <|> MP.eof) depends, I think, on a particular case. I'm a bit hesitant to present any particular approach here as "the way to go" in general.