Complexity Requirements?
brevzin opened this issue · 7 comments
Thought I'd follow up my last question with another question.
I see that in flux, filter
's implementation of first
just does a find_if
:
flux/include/flux/op/filter.hpp
Lines 50 to 53 in 1c128b5
So is the intent that there is no complexity requirement on first
(and potentially last
and is_last
)?
My go-to example for the complexity requirements in Ranges is r.drop_last_while(p).stride(2)
, but in flux actually I think it's enough to ask about r.drop_last_while(p)
even by itself (which flux doesn't have yet, but don't treat this as a request to go rush to add it). If the flux::is_last
check is going to be implemented as a find_if
(or, whatever, find_last_if_not
) then iterating over this sequence becomes quadratic time:
flux/include/flux/op/for_each_while.hpp
Lines 24 to 28 in 1c128b5
Another good question that I really ought to document somewhere! I actually changed this recently -- previously filter
, drop_while
etc did the same sort of caching as Ranges.
The new policy is as follows:
is_last()
implementations must beO(1)
- If an implementation provides
last()
it should beO(1)
- For random-access sequences,
first()
must beO(1)
as well
For sequences weaker than random access, first()
may be at worst O(N)
. (This exception exists pretty much solely to allow filter
to be const iterable.)
I believe the constant-time is_last
requirement avoids quadratic behaviour in drop_last_while
, unless I'm misunderstanding? That was the intention, anyway.
Does that sound reasonable, or are there problematic cases I haven't considered?
is_last()
implementations must beO(1)
Which I guess means either flux::drop_last_while
must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be a flux::drop_last_while
? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.
To explain the seemingly asymmetric requirement that last()
should be O(1)
, it's because I suspect that people coming from ranges will write things like
while (cur != seq.last()) { ... }
and I don't want that to be needlessly slow.
is_last()
implementations must beO(1)
Which I guess means either
flux::drop_last_while
must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be aflux::drop_last_while
? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.
Yeah, drop_last_while
, when I come to write it, will probably need to do internal caching and therefore not be const iterable -- but since it's likely to be much less commonly used than filter
or (front) drop_while
, I'm less worried about it.
We already have a cache_last()
adaptor that is non-const iterable (and does exactly what the name suggests) as well as several single-pass-only sequences, so it wouldn't be the only one.
So the theory that is_last()
must be O(1)
but first()
can be linear is all very well, but not when I do things like this:
flux/include/flux/op/reverse.hpp
Lines 61 to 64 in 1c128b5
This is... really not great :(
Thanks Barry for asking tricky questions and making me think deeply about this!
I think that silent, unexpected quadratic behaviour is absolutely unacceptable. That means that is_last(seq, cur)
must be constant time (since we call it for every element when iterating), and so we need to think carefully about what happens with things like reverse_adaptor::is_last
above.
There are a few different options:
-
We could go back to requiring that
first()
is O(1) for all sequences. This would mean re-adding caching tofilter
,drop_while
and various others, in turn making them not const-iterable. My main objections to this are:- It means we need to use
flux::mut_ref
rather thanflux::ref
to pass them to e.g. a zip adaptor, and as a general rule we try to discourage usingmut_ref
. It's pretty unfortunate if one of the two most common adaptors requires it. - The fact that it's not const iterable, and/or you need to use
mut_ref
is not very discoverable unless you already know that it does internal caching
This basically just applies to
filter
because it's so commonly used: unlike certain other people, I'm not opposed to non-const iterable sequences in principle. We already have several of them in Flux. - It means we need to use
-
We could add caching to
reverse_adaptor
instead. But this is actually worse, because in most cases the underlying sequence does have a cheap O(1)first()
operation (e.g.vector
), and so caching is going to add overhead and probably worse codegen (although that's already pretty terrible forreverse_adaptor
outside of the internal iteration fast path). -
We could have a concept that says "I have constant-time
first()
", and makereverse_adaptor
require it. So thenvector
would model the concept butfilter_adaptor
would not, and the user would need to manually add some sort of caching adaptor if they wanted to reverse a filtered sequence.
It turns out that for option 3, we don't actually need a new concept at all: instead, we can add a semantic requirement to the bounded_sequence
concept that both first()
and last()
are O(1). This makes the bounded sequence concept pretty easy to explain ("the sequence can produce cursors to the beginning and end in constant time") and is actually a lot nicer than having uneven requirements on first and last as mentioned above.
So the trade off is that in order to have a const-iterable filter
, it isn't reversible without adding a caching adaptor. That actually seems pretty reasonable to me.
Any thoughts @brevzin?
Returning to this after a while... my latest thinking is as follows:
- The
last()
function, if provided, must beO(1)
(otherwise there's no point to it, as for any finite multipass sequence we can always find the end in O(N) simply by iterating over it) - For random-access sequences,
first()
must be O(1) as well - For weaker sequences,
first()
may be O(N), but in this case they must also provide a marker, something likedisable_bounded_sequence = true
- The
bounded_sequence
concept requires both that alast()
function exists, and!disable_bounded_sequence
I think this would cover everything.... the only downside is that just about every adaptor would need to do something like disable_bounded_sequence = !bounded_sequence<Base>
which is the sort of tedious, easy-to-forget boilerplate that I really wanted to avoid in Flux :(