m4rw3r/chomp

Size hint for internal iterator used by `many*` and `count`

m4rw3r opened this issue · 4 comments

Implementing Iterator::size_hint will enable more efficient allocation for the containers. According to profiling a lot of time is spent allocating and reallocating in some situations, a better size_hint would improve performance.

  • count should yield (n, Some(n)) since it will always result in n elements on success.
  • many1 and sep_by1 should yield (1, None) since they have no upper bound
  • many, many_till and sep_by can use the default value of (0, None)

It might also be feasible to implement a combinator which is a hybrid of count and many, where specifying both a lower and upper bound is possible. This would make it a lot more efficient to allocate some parts (and by using monadic composition it can even reserve space for a certain known number of elements which was specified earlier in the message).

Spec for bounded many

fn many(Input<I>, R, Parser) -> ParseResult<I, T, E>
  where R:      BoundedRange,
        Parser: FnMut(Input<I>) -> ParseResult<I, U, E>,
        T:      FromIterator<U>

trait BoundedRange { ... }

impl BoundedRange for Range { ... }
impl BoundedRange for RangeFull { ... }
impl BoundedRange for RangeFrom { ... }
impl BoundedRange for RangeTo { ... }
  • Iteration should stop once the max value is reached (if it is specified by the range), no more than n items should be emitted unless the range is lacking an upper bound
  • A size_hint based on the range should be provided
  • If an error or incomplete is encountered outside of the range (ie. if fewer items than the lower bound have been emitted), the error should be propagated
  • If an error is encountered inside of the range the parser should be considered complete and return the resulting FromIterator value
  • If an incomplete is encountered inside of the range of the parser it should be considered complete if the input is END_OF_INPUT and input.len is 0.

TODO

  • bounded::many spec
  • BoundedRange trait
  • bounded::many impls
    • Replace uses of iter::Iter
      • count
      • many
      • many1
      • sep_by
      • sep_by1
  • bounded::many_till
    • Replace uses of iter::IterTill (many_till)
  • bounded::skip_many

Aww, did you decide this wasn't feasible?

It is merged in a local copy of master which I have not yet pushed, along with #18. Currently I am polishing some things and updating all the doctests to use parse_only instead of Input::new and unwrap. So it is not gone at all, just that the feature itself is done :)

The performance is promising, equal or slightly better performance in most cases except for in very small benchmarks.

It is merged in a local copy of master which I have not yet pushed,

Ah, gotcha. I'm just so used to seeing issues automatically closed by Github when the corresponding commit is pushed to master. When I didn't see the "X closed by Y" message, I just assumed that meant that it couldn't be done.

The performance is promising

Great!

@shepmaster You can try it out now if you want, 0.2.0 is out :)