m4rw3r/chomp

Infinite loop?

MarkSwanson opened this issue · 7 comments

skip_many() and many() do not seem to be propagating the incomplete state.
Or maybe the or combinator is always resetting the stream position and not propagating the error?

I expect the flow to be:

  • skip_many(all)
  • all OR tests b() and c() - both fail
  • all returns fail
  • skip_many returns fail <-- this does not happen ... infinite loop ...

Ideas?

Thanks!

i == "fffff".as_bytes(); // will never match any token...
parse!{i;
    skip_many(all);
    ...

pub fn c<'a>(i: Input<'a, u8>) -> U8Result<()> {
    parse!{i;
        take_while(is_whitespace);
        ret () } }

pub fn b<'i, 'a>(i: Input<'i, u8>, s: &'a str) -> U8Result<'i, ()> {
    parse!{i;
        take_while(is_whitespace);
        ret () } }

pub fn all<'a>(i: Input<'a, u8>) -> U8Result<()> {
    let s = String::new();
    parse!{i;
            b(&s) <|>
            c();
        ret () } }
```rust

When I log the result of parse!() (inside all()) I get a loop of:
ParseResult(Data(Input(END_OF_INPUT, [102, 102, 102]), []))

skip_many() seems to ignore END_OF_INPUT

Hi,

This is expected behaviour for the specific combination of parsers and combinators you have used above. The issue you have in the code is that none of the parsers you use (take_while essentially) require any characters to be consumed, same is true for most combinators (including skip_many). So the parser attempts to parse but it is not forced to read more than 0 input tokens to actually achieve a successful iteration:

Input
ffff   skip_many(all)
ffff       all()
ffff           b(&[])
ffff               take_while(is_whitespace)
ffff               <- Ok(&[])   // Nothing was matched yet ok since `take_while` has no lower bound
ffff           <- Ok(())
ffff       <- Ok(())  // `c` is skipped since we got success from `b`
ffff   // `skip_many` continues to iterate over `all` since `all` returned `Ok`

Note that none of the parsers above actually consume any input. So the buffer is still containing data, and skip_many will continue to iterate until either Error or Incomplete is encountered from the wrapped parser.

If you change some of these parsers and combinators to versions which require at least one matching element (take_while1, skip_many1 and so on), then you will see an immediate failure to parse "ffff" since a whitespace would be expected.

As for END_OF_INPUT, it is a flag which is used by combinators and some parsers to determine the behaviour in case they exhaust the slice inside of the Input. This prevents quite a few issues where a streaming parser would exit prematurely (Nom has this issue for pretty much all its combinators since it does not carry a flag like this). In the specific case of skip_many it will prevent Incomplete from being propagated if END_OF_INPUT is set, this allows for eg. skipping of a longer sequence while retrieving a partial remainder of the same at the end of input instead of only being allowed to fail (to instead force a full match you would sequence with eof after the skip_many to ensure that the buffer is empty after skipping).

Thanks Martin!

Changing some parsers to require at least one matching token doesn't work because all() wraps all of the parsers in 'or'.

Example: changing c() to token(b'f') doesn't work either. skip_many still never terminates.

pub fn c<'a>(i: Input<'a, u8>) -> U8Result<'a, ()> {
parse!{i;
token(b'f');
ret () } }

I don't want to change all() to use skip_many1() because the config file I'm parsing may never match any of the combinators.

Actually, even if I change all() to use skip_many1() it does not terminate.

matching eof after skip_many does not work because skip_many never terminates.

I'm actually transforming a config file into Rust code. Each parser / combinator does not have to return anything as they each will push transformed rust code onto a String. Example: b(i, s)

I thought testing for eof like this might work:
let aa = b(&s) <|> c() <|> eof();

but it does not.

Is there perhaps a way I can manually create my own end_of_input() check that checks State::Data(new(END_OF_INPUT, _) ? And then return an error manually?

The issue you are having is that at least one of the parsers in the alternation in all allows for a successful match without consuming input. It is not actually all which needs the skip_many1 but the b and c parsers which need to try to at least consume one token (eg. if you know that b is tagged with something as a start or that it consists of only numbers and so on). You changing c to token does not matter because b is evaluated first and is always successful, so that result is returned and since it is always successful skip_many will iterate endlessly.

skip_many (and other repeating combinators) will continue to repeat the nested parser until it returns Error or returns Incomplete. So there is nothing actually wrong with the main building-blocks you are using, but the smaller parsers need to actually try to match something (and therefore fail if they do not match) for it to be possible to use iterating combinators. If you use at least one take_while1 or token or similar parser which has a lower bound greater than zero in the basic building-blocks you will have no issues since you cannot get a zero-sized successful match.

Technically you do not need to consume tokens, but the parser must be able to fail to be able to stop the iteration. You could eg. use peek_next to branch on a character and return an error if it is not recognized.

Thanks for explaining Martin.
You nailed it perfectly.

Summary in case someone else comes across this:

It was a simple logic error on my part.
I was stubbing out parser functions that weren't consuming any input and would always succeed.
Since I was combining normal parsers with these stubs using an 'or' combinator, skip_many(or ...) would always succeed.

I'm enjoying using chomp!
After a couple of evenings I have a parser that is transforming parts of a proprietary configuration / control language into Rust, then compiled into a shared lib that gets dynamically loaded and called based on a network request. I hope to be able to talk about it someday.

Thanks for Chomp!

Awesome! If you have any more quesitons or suggestions, do not hesitate to ask!

Btw, a tip: I usually start with the small building blocks and go from there, then you are assured that the basic parts are working as intended. You can even start by writing tests for the small pieces first which means that it will be well tested (and leave the unimplemented parts using the unimplemented! macro) :)

#[test]
fn test_comment() {
    assert_eq!(parse_only(|i| matched_by(i, comment), b"# foobar"), Ok((&b"# foobar"[..], Value::Comment(b" foobar"))));
    // ...
}