Clear documentation on "streaming" meaning

Question

Clear documentation on "streaming" meaning

douglas-raillard-arm opened this issue 2 years ago · 0 comments

douglas-raillard-arm commented 2 years ago

The main bit of documentation on the streaming parsers I found is this one:
https://docs.rs/nom/latest/nom/#streaming--complete

While documenting the behavior of nom, it might be a good idea to stress that nom is not streaming in the sense that it will never be able to process arbitrary input in constant memory no matter what the parser does (e.g. count_many0 could in an ideal world run in constant memory). This is because the only way of retrying a partial parse is to reparse all from the beginning, as opposed to starting again where it actually stopped such as Haskell's scanner parser combinator:
https://hackage.haskell.org/package/scanner-0.3.1/docs/Scanner.html#t:Result

This makes streaming mode in nom much less useful, and the inability to distinguish between EOF and "there might be more input coming" means that any parser finishing by an optional streaming sub-parser will always wait for more input, even if there is no more:
#271

One suggestion on that thread recommends using complete() to ensure any optional parser is complete but that breaks reuse of the parser where it's expected to be fully streaming. One problematic example is parsers finishing by a separated_list0. Either:

the separator is a complete parser: it will detect correctly EOF, but if the input happens to be truncated at a sep boundary, it will wrongly successfully terminate (I guess, untested).
the separator is a streaming parser: separated_list0 will simply never parse successfully, as it will always be waiting for more input after the last item.

If I'm right this basically means the only workable solution is to use a complete parser all the way and provide all the data in one chunk. There is no real disadvantage in doing so in that case since a streaming parser would eventually require the entirety of the input in memory anyway (possibly memory mapped) .

My suggestions are:

Make it obvious in the documentation what can be achieved using streaming parsers, and critically what cannot be achieved.
Add a warning in the separated_list0 doc stating that streaming parsers should not be used for sep if separated_list0 is the last sub-parser.
If it is possible, detect and forbid when a streaming parser finishes by an optional streaming sub-parser (at runtime or using type tricks).
Maybe provide a way to send an EOF marker in the input e.g. by allowing (&[u8], bool) input where the boolean indicates whether the input is complete or not. If it's complete, a streaming parser should never ask for more and simply fail.

Another (challenging) route would be to make nom fully streaming: Err::Incomplete could carry a closure that when called will resume the parser with extra input. AFAIK the only way to achieve that in rust would be to either use a macro such as that mdo to define all parsers or use async/await style and use the suspended future's poll method as a way to resume parsing from a given point in the code.