Infinite loop?
MarkSwanson opened this issue · 7 comments
skip_many() and many() do not seem to be propagating the incomplete state.
Or maybe the or combinator is always resetting the stream position and not propagating the error?
I expect the flow to be:
- skip_many(all)
- all OR tests b() and c() - both fail
- all returns fail
- skip_many returns fail <-- this does not happen ... infinite loop ...
Ideas?
Thanks!
i == "fffff".as_bytes(); // will never match any token...
parse!{i;
skip_many(all);
...
pub fn c<'a>(i: Input<'a, u8>) -> U8Result<()> {
parse!{i;
take_while(is_whitespace);
ret () } }
pub fn b<'i, 'a>(i: Input<'i, u8>, s: &'a str) -> U8Result<'i, ()> {
parse!{i;
take_while(is_whitespace);
ret () } }
pub fn all<'a>(i: Input<'a, u8>) -> U8Result<()> {
let s = String::new();
parse!{i;
b(&s) <|>
c();
ret () } }
```rust
When I log the result of parse!() (inside all()) I get a loop of:
ParseResult(Data(Input(END_OF_INPUT, [102, 102, 102]), []))
skip_many() seems to ignore END_OF_INPUT
Hi,
This is expected behaviour for the specific combination of parsers and combinators you have used above. The issue you have in the code is that none of the parsers you use (take_while
essentially) require any characters to be consumed, same is true for most combinators (including skip_many
). So the parser attempts to parse but it is not forced to read more than 0 input tokens to actually achieve a successful iteration:
Input
ffff skip_many(all)
ffff all()
ffff b(&[])
ffff take_while(is_whitespace)
ffff <- Ok(&[]) // Nothing was matched yet ok since `take_while` has no lower bound
ffff <- Ok(())
ffff <- Ok(()) // `c` is skipped since we got success from `b`
ffff // `skip_many` continues to iterate over `all` since `all` returned `Ok`
Note that none of the parsers above actually consume any input. So the buffer is still containing data, and skip_many
will continue to iterate until either Error
or Incomplete
is encountered from the wrapped parser.
If you change some of these parsers and combinators to versions which require at least one matching element (take_while1
, skip_many1
and so on), then you will see an immediate failure to parse "ffff"
since a whitespace would be expected.
As for END_OF_INPUT
, it is a flag which is used by combinators and some parsers to determine the behaviour in case they exhaust the slice inside of the Input
. This prevents quite a few issues where a streaming parser would exit prematurely (Nom has this issue for pretty much all its combinators since it does not carry a flag like this). In the specific case of skip_many
it will prevent Incomplete
from being propagated if END_OF_INPUT
is set, this allows for eg. skipping of a longer sequence while retrieving a partial remainder of the same at the end of input instead of only being allowed to fail (to instead force a full match you would sequence with eof
after the skip_many
to ensure that the buffer is empty after skipping).
Thanks Martin!
Changing some parsers to require at least one matching token doesn't work because all() wraps all of the parsers in 'or'.
Example: changing c() to token(b'f') doesn't work either. skip_many still never terminates.
pub fn c<'a>(i: Input<'a, u8>) -> U8Result<'a, ()> {
parse!{i;
token(b'f');
ret () } }
I don't want to change all() to use skip_many1() because the config file I'm parsing may never match any of the combinators.
Actually, even if I change all() to use skip_many1() it does not terminate.
matching eof after skip_many does not work because skip_many never terminates.
I'm actually transforming a config file into Rust code. Each parser / combinator does not have to return anything as they each will push transformed rust code onto a String. Example: b(i, s)
I thought testing for eof like this might work:
let aa = b(&s) <|> c() <|> eof();
but it does not.
Is there perhaps a way I can manually create my own end_of_input() check that checks State::Data(new(END_OF_INPUT, _) ? And then return an error manually?
The issue you are having is that at least one of the parsers in the alternation in all
allows for a successful match without consuming input. It is not actually all
which needs the skip_many1
but the b
and c
parsers which need to try to at least consume one token (eg. if you know that b
is tagged with something as a start or that it consists of only numbers and so on). You changing c
to token
does not matter because b
is evaluated first and is always successful, so that result is returned and since it is always successful skip_many
will iterate endlessly.
skip_many
(and other repeating combinators) will continue to repeat the nested parser until it returns Error
or returns Incomplete
. So there is nothing actually wrong with the main building-blocks you are using, but the smaller parsers need to actually try to match something (and therefore fail if they do not match) for it to be possible to use iterating combinators. If you use at least one take_while1
or token
or similar parser which has a lower bound greater than zero in the basic building-blocks you will have no issues since you cannot get a zero-sized successful match.
Technically you do not need to consume tokens, but the parser must be able to fail to be able to stop the iteration. You could eg. use peek_next
to branch on a character and return an error if it is not recognized.
Thanks for explaining Martin.
You nailed it perfectly.
Summary in case someone else comes across this:
It was a simple logic error on my part.
I was stubbing out parser functions that weren't consuming any input and would always succeed.
Since I was combining normal parsers with these stubs using an 'or' combinator, skip_many(or ...) would always succeed.
I'm enjoying using chomp!
After a couple of evenings I have a parser that is transforming parts of a proprietary configuration / control language into Rust, then compiled into a shared lib that gets dynamically loaded and called based on a network request. I hope to be able to talk about it someday.
Thanks for Chomp!
Awesome! If you have any more quesitons or suggestions, do not hesitate to ask!
Btw, a tip: I usually start with the small building blocks and go from there, then you are assured that the basic parts are working as intended. You can even start by writing tests for the small pieces first which means that it will be well tested (and leave the unimplemented parts using the unimplemented!
macro) :)
#[test]
fn test_comment() {
assert_eq!(parse_only(|i| matched_by(i, comment), b"# foobar"), Ok((&b"# foobar"[..], Value::Comment(b" foobar"))));
// ...
}