tcbrindle/flux

Complexity Requirements?

brevzin opened this issue · 7 comments

Thought I'd follow up my last question with another question.

I see that in flux, filter's implementation of first just does a find_if:

static constexpr auto first(auto& self) -> cursor_type
{
return cursor_type{flux::find_if(self.base_, std::ref(self.pred_))};
}

So is the intent that there is no complexity requirement on first (and potentially last and is_last)?

My go-to example for the complexity requirements in Ranges is r.drop_last_while(p).stride(2), but in flux actually I think it's enough to ask about r.drop_last_while(p) even by itself (which flux doesn't have yet, but don't treat this as a request to go rush to add it). If the flux::is_last check is going to be implemented as a find_if (or, whatever, find_last_if_not) then iterating over this sequence becomes quadratic time:

auto cur = first(seq);
while (!is_last(seq, cur)) {
if (!std::invoke(pred, read_at(seq, cur))) { break; }
inc(seq, cur);
}

Another good question that I really ought to document somewhere! I actually changed this recently -- previously filter, drop_while etc did the same sort of caching as Ranges.

The new policy is as follows:

  • is_last() implementations must be O(1)
  • If an implementation provides last() it should be O(1)
  • For random-access sequences, first() must be O(1) as well

For sequences weaker than random access, first() may be at worst O(N). (This exception exists pretty much solely to allow filter to be const iterable.)

I believe the constant-time is_last requirement avoids quadratic behaviour in drop_last_while, unless I'm misunderstanding? That was the intention, anyway.

Does that sound reasonable, or are there problematic cases I haven't considered?

  • is_last() implementations must be O(1)

Which I guess means either flux::drop_last_while must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be a flux::drop_last_while? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.

To explain the seemingly asymmetric requirement that last() should be O(1), it's because I suspect that people coming from ranges will write things like

while (cur != seq.last()) { ... }

and I don't want that to be needlessly slow.

  • is_last() implementations must be O(1)

Which I guess means either flux::drop_last_while must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be a flux::drop_last_while? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.

Yeah, drop_last_while, when I come to write it, will probably need to do internal caching and therefore not be const iterable -- but since it's likely to be much less commonly used than filter or (front) drop_while, I'm less worried about it.

We already have a cache_last() adaptor that is non-const iterable (and does exactly what the name suggests) as well as several single-pass-only sequences, so it wouldn't be the only one.

So the theory that is_last() must be O(1) but first() can be linear is all very well, but not when I do things like this:

static constexpr auto is_last(auto& self, cursor_type const& cur) -> bool
{
return cur.base_cur == flux::first(self.base_);
}

This is... really not great :(

Thanks Barry for asking tricky questions and making me think deeply about this!

I think that silent, unexpected quadratic behaviour is absolutely unacceptable. That means that is_last(seq, cur) must be constant time (since we call it for every element when iterating), and so we need to think carefully about what happens with things like reverse_adaptor::is_last above.

There are a few different options:

  1. We could go back to requiring that first() is O(1) for all sequences. This would mean re-adding caching to filter, drop_while and various others, in turn making them not const-iterable. My main objections to this are:

    • It means we need to use flux::mut_ref rather than flux::ref to pass them to e.g. a zip adaptor, and as a general rule we try to discourage using mut_ref. It's pretty unfortunate if one of the two most common adaptors requires it.
    • The fact that it's not const iterable, and/or you need to use mut_ref is not very discoverable unless you already know that it does internal caching

    This basically just applies to filter because it's so commonly used: unlike certain other people, I'm not opposed to non-const iterable sequences in principle. We already have several of them in Flux.

  2. We could add caching to reverse_adaptor instead. But this is actually worse, because in most cases the underlying sequence does have a cheap O(1) first() operation (e.g. vector), and so caching is going to add overhead and probably worse codegen (although that's already pretty terrible for reverse_adaptor outside of the internal iteration fast path).

  3. We could have a concept that says "I have constant-time first()", and make reverse_adaptor require it. So then vector would model the concept but filter_adaptor would not, and the user would need to manually add some sort of caching adaptor if they wanted to reverse a filtered sequence.

It turns out that for option 3, we don't actually need a new concept at all: instead, we can add a semantic requirement to the bounded_sequence concept that both first() and last() are O(1). This makes the bounded sequence concept pretty easy to explain ("the sequence can produce cursors to the beginning and end in constant time") and is actually a lot nicer than having uneven requirements on first and last as mentioned above.

So the trade off is that in order to have a const-iterable filter, it isn't reversible without adding a caching adaptor. That actually seems pretty reasonable to me.

Any thoughts @brevzin?

Returning to this after a while... my latest thinking is as follows:

  • The last() function, if provided, must be O(1) (otherwise there's no point to it, as for any finite multipass sequence we can always find the end in O(N) simply by iterating over it)
  • For random-access sequences, first() must be O(1) as well
  • For weaker sequences, first() may be O(N), but in this case they must also provide a marker, something like disable_bounded_sequence = true
  • The bounded_sequence concept requires both that a last() function exists, and !disable_bounded_sequence

I think this would cover everything.... the only downside is that just about every adaptor would need to do something like disable_bounded_sequence = !bounded_sequence<Base> which is the sort of tedious, easy-to-forget boilerplate that I really wanted to avoid in Flux :(