Streaming utilities and generalized input management
m4rw3r opened this issue · 0 comments
- Datasources
-
io::Read
-
Iterator
-
- Buffers
- Fixed size
- Growing
- Capped
- Parse sources
- Slice source
-
DataSource
source- Manually managed variant
- Trait for sources
-
ParseError
Since Chomp is a slice-based parser it cannot use an Iterator
- or Read
-based input directly, it has to work on slices since that is the most efficient method of allowing zero-copy parsing1.
Buffers
Buffers generic over std::io::Read
. These should automatically fill the buffer when necessary if configured to do so. If they failed to parse with the data acquired from the Read
source but managed to read more than zero a Retry
error should be returned to indicate that another attempt to read and parse the data will be necessary.
Growable buffer
Useful for parsing trusted data, or where amount of memory allocated for the parser-buffer does not matter. Should have an optional maximum buffer size, should error in the same way as the fixed size buffer if this limit is hit and it still fails to parse.
TODO: More spec
Fixed size buffer
Useful when parsing data from an unknown source, should only allocate a single slab on construction and then attempt to parse with at most the full size of the fixed size buffer. If a parser still wants more data than the fixed-size buffer can provide it should return an error indicating that the operation could not complete (with the total amount of data the parser requested (ie. including the currently used part of the fixed buffer)).
Manually managed buffer
A buffer where the user has control over when to attempt to fill the buffer. Useful for eg. cooperative multitasking, where an input could completely saturate a parser preventing any other operation from running.
Should probably be a configuration option on the existing buffers, will cause the buffer to skip filling automatically and only return a Retry
until the user has called a method asking the buffer to fill itself.
Source
trait
Trait which enables different buffer implementations to be treated the same.
The internal storage might not be necessarily tied to the source itself (eg. &[u8]
supplied by user), so it has to have one lifetime for the struct implementing Source
and one for the data itself (which the created T
and E
depend on).
pub trait Source<'a, 'i, I> {
fn parse<F, T, E>(&'a mut self, f: F) -> Result<T, ParseError<'i, I, E>>
where F: FnOnce(Input<'i, I>) -> ParseResult<'i, I, T, E>,
T: 'i,
E: 'i;
}
This trait is not compatible with for
-loops since self
is mutable and the resulting T
and E
prevent at least the internal storage from being modified (which prevents another mutable borrow of self
). loop
+match
, while
or macros will have to be used.
A matching Into
-style trait is probably needed.
ParseError
type
The generic parse error needs to be able to report the user defined parse errors, parse failures due to not enough data, indication to retry, any IO error and finally that there is nothing more to parse (ie. successful end).
1: A rope datastructure where pieces are yielded as they become available can enable us to write a pretty efficient "zero-copy" parser with possibility to resume. But it is "zero-copy" as in not copying the input to the parser, the data in the rope data-structure still needs to be allocated on some arena/heap/buffer before being passed to the parser.