m4rw3r/chomp

Many_till combinator seems to end up in an endless loop

Closed this issue · 2 comments

vegai commented

I wrote a naivish S-expression parser. I had a problem with the many_till combinator.

Here's the code: https://gist.github.com/anonymous/8a6fe19ab4e418eefad383455ae87a33 -- notice lines 56-60 where I would have wanted instead to do

many_till(parse_svalue, eof)

, but that just ended up in an endless loop with the simple "(a)" input. It's as if many_till did not notice that none of the three <|> -separated parsers matched. Is this a bug in chomp or am I misusing / missing something?

I suspect it is take_while in parse_value which is the culprit. take_while can match zero tokens which means that if nothing alphanumeric is encountered in your case it will yield an empty string as match. This causes many_till (and other combinators) to loop endlessly since they will keep matching 0 tokens and the input will never be empty. Try with a take_while1 instead which will require at lest one matching token.

I have improved and fixed the parser for S-expressions above: https://gist.github.com/m4rw3r/2f850c5f6895d1c5562adf84a94e48f2 I made the following changes:

  • Removed parse_empty_inner_list since it was just a specializaion
  • Replaced string with token, the latter is more efficient since no string-slice needs to be iterated
  • Changed parse_svalue to take care of leading whitespace
  • Use take_while1 to make sure parse_value actually reads at least one character into value. Previously take_while was used which can consume 0 tokens if none match, resulting in loops.
  • parse_svalue attempts parse_value first since it is a terminating token, makes it easier to make sure loops are not endless
  • Use the many combinator since parse_sexpr is nested, using many_till(..., eof) would not be nestable since "((foo))" would not parse since an <EOF> would be expected before the last ). To make use of the parse_sexpr parser and make sure it parses the whole slice, chain them using the skip (<*) operator: parse!{i; parse_sexpr <* eof}