tcbrindle/flux

Add adjacent_remove_if (like) adaptor

tcbrindle opened this issue · 5 comments

From @codereport on Twitter "X":

I was looking at that solution the other day and have been meaning to open an issue requesting the equivalent of range-v3's adjacent_remove_if

So here it is!

Range-V3 has both adjacent_remove_if and adjacent_filter, which seem like one should just call the other with an inverted predicate, but which actually do slightly different things:

views::adjacent_filter: For each pair of adjacent elements in a source range, evaluate the specified binary predicate. If the predicate evaluates to false, the second element of the pair is removed from the result range; otherwise, it is included. The first element in the source range is always included. (For instance, adjacent_filter with std::not_equal_to filters out all the non-unique elements.)

views::adjacent_remove_if: or each pair of adjacent elements in a source range, evaluate the specified binary predicate. If the predicate evaluates to true, the first element of the pair is removed from the result range; otherwise, it is included. The last element in the source range is always included.

There's also views::unique, which actually does call adjacent_filter with an inverted predicate.

The end result is that unique retains the first of a run of elements that satisfy the predicate, whereas adjacent_remove_if retains the last such element (example).

So the question is, which behaviour do we want in Flux? Is it worth trying to provide both, like Range-V3 does, or would that just be confusing? (Given that it's just taken me an hour to work out why Rv3 provides both unique and adjacent_remove_if when one appears redundant, I'm voting for "confusing".)

After we've sorted that out, there's also the question of naming:

  • adjacent_remove_if is very long, and we don't use the term remove_if anywhere else in Flux, but it does at least describes what it does
  • adjacent_filter is a little bit shorter, but retaining elements that match the predicate is probably not what users would want most of the time
  • unique is consistent with the STL, but (to me) has strong connotations of unique-ifying the whole sequence rather than just adjacent elements (unless the sequence was sorted to begin with)

The Rust itertools crate has a pair of functions, dedup() and dedup_by(), the latter of which takes a predicate and the former specialised for equality*. In C++ we could just say dedup() with a default parameter, which I quite like -- although again, it doesn't necessarily make clear that it only removes adjacent duplicates.

*: The example for dedup_by suggests that the Rust version has the same behaviour as Rv3's views::unique (retain the first element), but this isn't actually made clear in the docs

@codereport Any thoughts on what behaviour and/or name(s) Flux should use?

Ping @brevzin as well if case you have any suggestions!

I actually didn't know adjacent_filter existed. Furthermore, this slight different difference seems like an incredible gotcha. It breaks the "symmetry" of filter and remove_if.

In C++ we could just say dedup() with a default parameter, which I quite like

I would veer away from default parameters. My rule of thumb is except for very common function names like sum (which I would assume is just a specialization), avoid default params and force the user to choose default operation. This is was got us into trouble with std::mismatch and std::adjacent_difference. The zip_find and adjacent_transform versions in Flux and C++23 Ranges (respectively) are much better names and designs imo.

@codereport Any thoughts on what behaviour and/or name(s) Flux should use?

If this was C++26 ranges, I would definitely recommend the name adjacent_filter and keep the semantics of range-v3. However, Flux breaks tradition with things like using map instead of transform. That being said, it does have both adjacent_map and filter so I would probably still recommend adjacent_filter.

In general, I think Rust choosing dedup was a bad choice. [ at this point, I remember that I actually gave a ⚡ talk on this a couple years ago and went and found it https://www.youtube.com/watch?v=tsfaE-eDusg only to realize that it did not support what I was about to say 😂 ] .

image

..

So I guess dedup is the more common name for std::unique 🤦 lol.

I still like adjacent_filter, but if you like dedup adding a specialization similar to sum for fold sounds find to me.

Thanks for the feedback Conor!

So here's a proposal:

  • I'll add flux::adjacent_filter() taking a binary predicate, with the same semantics as Range-V3's views::adjacent_filter
  • I'll also add flux::dedup() which calls adjacent_filter(std::not_equal_to{}), as shorthand for the most common use-case

If we ever decide we want versions with similar semantics to Rv3's views::adjacent_remove_if, we can call them something like adjacent_filter_last/dedup_last.

Does that sound reasonable?

Sounds great to me 👍