purescript-contrib/purescript-parsing

Can we have more typesafety w.r.t. many-like combinators and infinite loops?

Closed this issue · 7 comments

Is your change request related to a problem? Please describe.
many p will get stuck in a loop if p possibly doesn't consume any input but still succeeds.

One could argue that the parser is bogus in that case. However, it would be nice to have some compile time safety guarantees for this, because a hanging application isn't fun to deal with, even if it occurs while developing.

Some solution directions that cross my mind

  • Utilizing the typesystem in some way that makes it impossible to pass a parser that possibly consumes no input but still succeeds.
  • Changing the semantics of many such that it stops (or fails?) whenever the position of the parser state hasn't moved after applying p.

I like your thinking about this and it would be nice to solve this problem. Some further notes:

  • Utilizing the typesystem in some way that makes it impossible to pass a parser that possibly consumes no input but still succeeds.

I can think of one useful parser that consumes no input and succeeds: eof.

  • Changing the semantics of many such that it stops (or fails?) whenever the position of the parser state hasn't moved after applying p.

Because ParserT is a monad transformer, it's possible to imagine a ParserT s State a which consumes no input of the first parse but does consume input on the second parse, due to some mechanics of the internal state.

I've given a little bit of thought. I like the direction of the second solution (stopping the loop when no input is consumed) the best for the following reasons.

  • It does not impose new complexity on the construction of parsers. Parsers can be constructed like before, except for the fact that no infinite loop will occur for non-consuming parsers.
  • eof is indeed a useful parser and it would be nice if many could deal with it. You wouldn't pass it directly to many of course, but it can certainly be used in a more complex parser case. Forbidding such a parser construction feels like an unnecessary constraint.

Because ParserT is a monad transformer, it's possible to imagine a ParserT s State a which consumes no input of the first parse but does consume input on the second parse, due to some mechanics of the internal state.

I can't imagine a practical situation for this, but you are correct. Perhaps we can expose two flavors of many, a terminating one and one that will just keep looping until failure.

Perhaps we can expose two flavors of many, a terminating one and one that will just keep looping until failure.

I like the idea of having a flavor of many which would guarantee that it either makes forward progress or fails.

We would have to think about which combinators we want to supply this flavor for. Here are some candidates:

  • Data.Array.many
  • Data.List.many
  • Data.Array.some
  • Data.List.some
  • Data.List.manyRec
  • Data.List.someRec
  • Data.Array.NonEmpty.some
  • Text.Parsing.Parser.Combinators.many1
  • Text.Parsing.Parser.Combinators.skipMany
  • Text.Parsing.Parser.Combinators.skipMany1
  • Text.Parsing.Parser.Combinators.manyTill
  • Text.Parsing.Parser.Combinators.many1Till

That's a pretty long list. But we could start with just a few of those, and then add more if there was popular demand.

Here's another idea which just occurred to me: a combinator consume :: ParserT s m a -> ParserT s m a which forces a parser to fail if the parser would succeed while consuming no input. (Is this parser published somewhere else with a different name? I can't find it with two minutes of Googling.)

Here's an implementation for String:

consume :: forall m a. Monad m => ParserT String m a -> ParserT String m a
consume p = do
  ParseState input1 _ _ <- get
  x <- p
  ParseState input2 _ _ <- get
  if CodeUnits.length input2 < CodeUnits.length input1 then pure x else fail "Consumed no input."

So if we have a parserWhichMayNotConsume then instead of calling many parserWhichMayNotConsume we can call many (consume parserWhichMayNotConsume) and that will never hang.

I like your idea of a consume combinator. It makes writing terminating flavours of many-like combinators very concise, but I'm not even sure that it would be necessary when having that combinator available. Perhaps extending the documentation of many and cousins to mention the termination pitfalls and the consume function is enough. It's name conflicts with an existing function though.

I see you included some non-parser functions in your list, like Data.Array.many. I'm not familiar with those. Do they suffer from the same problem? The documentation states that the Lazy constraint ensures termination, but I'm not sure I understand that fully, nor am I in a position to verify that claim.

(The consume implementation I gave will probably not work because it doesn't take into account the parser state's consumed flag.)

(The consume implementation I gave will probably not work because it doesn't take into account the parser state's consumed flag.)

I changed my mind, I think that the consume implementation which I wrote above is good.

The internal consumed flag in purescript-parsing really means, “are we committed to this parsing branch, yes or no?” So the consumed flag resulting from the consume combinator will be whatever the consumed flag resulting from the argument parser was. Which is right.

What we want to assert with the consume combinator is that we’re making progress in the input. Which is a different concern than whether we’re committed to this parsing branch.

Nice @jamesdbrock . Perhaps the existence of the advance function could be mentioned in the docs of all relevant many-like parser combinators? So that users will know that the many combinator will hang when there is a chance that their parser does not consume, and how to solve that.