m4rw3r/chomp

Streaming utilities and generalized input management

m4rw3r opened this issue · 0 comments

  • Datasources
    • io::Read
    • Iterator
  • Buffers
    • Fixed size
    • Growing
      • Capped
  • Parse sources
    • Slice source
    • DataSource source
      • Manually managed variant
  • Trait for sources
  • ParseError

Since Chomp is a slice-based parser it cannot use an Iterator- or Read-based input directly, it has to work on slices since that is the most efficient method of allowing zero-copy parsing1.

Buffers

Buffers generic over std::io::Read. These should automatically fill the buffer when necessary if configured to do so. If they failed to parse with the data acquired from the Read source but managed to read more than zero a Retry error should be returned to indicate that another attempt to read and parse the data will be necessary.

Growable buffer

Useful for parsing trusted data, or where amount of memory allocated for the parser-buffer does not matter. Should have an optional maximum buffer size, should error in the same way as the fixed size buffer if this limit is hit and it still fails to parse.

TODO: More spec

Fixed size buffer

Useful when parsing data from an unknown source, should only allocate a single slab on construction and then attempt to parse with at most the full size of the fixed size buffer. If a parser still wants more data than the fixed-size buffer can provide it should return an error indicating that the operation could not complete (with the total amount of data the parser requested (ie. including the currently used part of the fixed buffer)).

Manually managed buffer

A buffer where the user has control over when to attempt to fill the buffer. Useful for eg. cooperative multitasking, where an input could completely saturate a parser preventing any other operation from running.

Should probably be a configuration option on the existing buffers, will cause the buffer to skip filling automatically and only return a Retry until the user has called a method asking the buffer to fill itself.

Source trait

Trait which enables different buffer implementations to be treated the same.

The internal storage might not be necessarily tied to the source itself (eg. &[u8] supplied by user), so it has to have one lifetime for the struct implementing Source and one for the data itself (which the created T and E depend on).

pub trait Source<'a, 'i, I> {
    fn parse<F, T, E>(&'a mut self, f: F) -> Result<T, ParseError<'i, I, E>>
      where F: FnOnce(Input<'i, I>) -> ParseResult<'i, I, T, E>,
            T: 'i,
            E: 'i;
}

This trait is not compatible with for-loops since self is mutable and the resulting T and E prevent at least the internal storage from being modified (which prevents another mutable borrow of self). loop+match, while or macros will have to be used.

A matching Into-style trait is probably needed.

ParseError type

The generic parse error needs to be able to report the user defined parse errors, parse failures due to not enough data, indication to retry, any IO error and finally that there is nothing more to parse (ie. successful end).

1: A rope datastructure where pieces are yielded as they become available can enable us to write a pretty efficient "zero-copy" parser with possibility to resume. But it is "zero-copy" as in not copying the input to the parser, the data in the rope data-structure still needs to be allocated on some arena/heap/buffer before being passed to the parser.