Add adjacent_remove_if (like) adaptor
tcbrindle opened this issue · 5 comments
From @codereport on Twitter "X":
I was looking at that solution the other day and have been meaning to open an issue requesting the equivalent of range-v3's adjacent_remove_if
So here it is!
Range-V3 has both adjacent_remove_if and adjacent_filter, which seem like one should just call the other with an inverted predicate, but which actually do slightly different things:
views::adjacent_filter: For each pair of adjacent elements in a source range, evaluate the specified binary predicate. If the predicate evaluates to false, the second element of the pair is removed from the result range; otherwise, it is included. The first element in the source range is always included. (For instance, adjacent_filter with std::not_equal_to filters out all the non-unique elements.)
views::adjacent_remove_if: or each pair of adjacent elements in a source range, evaluate the specified binary predicate. If the predicate evaluates to true, the first element of the pair is removed from the result range; otherwise, it is included. The last element in the source range is always included.
There's also views::unique, which actually does call adjacent_filter with an inverted predicate.
The end result is that unique retains the first of a run of elements that satisfy the predicate, whereas adjacent_remove_if retains the last such element (example).
So the question is, which behaviour do we want in Flux? Is it worth trying to provide both, like Range-V3 does, or would that just be confusing? (Given that it's just taken me an hour to work out why Rv3 provides both unique and adjacent_remove_if when one appears redundant, I'm voting for "confusing".)
After we've sorted that out, there's also the question of naming:
adjacent_remove_ifis very long, and we don't use the termremove_ifanywhere else in Flux, but it does at least describes what it doesadjacent_filteris a little bit shorter, but retaining elements that match the predicate is probably not what users would want most of the timeuniqueis consistent with the STL, but (to me) has strong connotations of unique-ifying the whole sequence rather than just adjacent elements (unless the sequence was sorted to begin with)
The Rust itertools crate has a pair of functions, dedup() and dedup_by(), the latter of which takes a predicate and the former specialised for equality*. In C++ we could just say dedup() with a default parameter, which I quite like -- although again, it doesn't necessarily make clear that it only removes adjacent duplicates.
*: The example for dedup_by suggests that the Rust version has the same behaviour as Rv3's views::unique (retain the first element), but this isn't actually made clear in the docs
@codereport Any thoughts on what behaviour and/or name(s) Flux should use?
Ping @brevzin as well if case you have any suggestions!
I actually didn't know adjacent_filter existed. Furthermore, this slight different difference seems like an incredible gotcha. It breaks the "symmetry" of filter and remove_if.
In C++ we could just say dedup() with a default parameter, which I quite like
I would veer away from default parameters. My rule of thumb is except for very common function names like sum (which I would assume is just a specialization), avoid default params and force the user to choose default operation. This is was got us into trouble with std::mismatch and std::adjacent_difference. The zip_find and adjacent_transform versions in Flux and C++23 Ranges (respectively) are much better names and designs imo.
@codereport Any thoughts on what behaviour and/or name(s) Flux should use?
If this was C++26 ranges, I would definitely recommend the name adjacent_filter and keep the semantics of range-v3. However, Flux breaks tradition with things like using map instead of transform. That being said, it does have both adjacent_map and filter so I would probably still recommend adjacent_filter.
In general, I think Rust choosing dedup was a bad choice. [ at this point, I remember that I actually gave a ⚡ talk on this a couple years ago and went and found it https://www.youtube.com/watch?v=tsfaE-eDusg only to realize that it did not support what I was about to say 😂 ] .
..
So I guess dedup is the more common name for std::unique 🤦 lol.
I still like adjacent_filter, but if you like dedup adding a specialization similar to sum for fold sounds find to me.
Thanks for the feedback Conor!
So here's a proposal:
- I'll add
flux::adjacent_filter()taking a binary predicate, with the same semantics as Range-V3'sviews::adjacent_filter - I'll also add
flux::dedup()which callsadjacent_filter(std::not_equal_to{}), as shorthand for the most common use-case
If we ever decide we want versions with similar semantics to Rv3's views::adjacent_remove_if, we can call them something like adjacent_filter_last/dedup_last.
Does that sound reasonable?
Sounds great to me 👍
