Consistent comparisons
tcbrindle opened this issue · 0 comments
We have several adaptors and algorithms which take a comparator, and require that the comparator produces a strict weak order over the sequence elements:
find_min
/find_max
/find_minmax
min
/max
/minmax
sort
set_union
/set_difference
/set_symmetric_difference
/set_intersection
cmp::min
/cmp::max
(for a sequence of two elements)
The comparator for all of these functions follows the classic STL model of requiring a binary predicate which returns true
if the first argument is "less than" the second, defaulting to std::ranges::less
. We use the standard library std::strict_weak_order
concept, in most cases via our own strict_weak_order_for<Seq1, Seq2>
concept which checks std::strict_weak_order
for all combinations of the sequences' element_t
, value_t&
and common_element_t
(the Flux equivalent of Ranges' std::indirect_strict_weak_order
).
The outlier here is flux::compare(seq1, seq2, cmp)
which requires the comparator to return a value of one of the three C++20 comparison categories (std::partial_ordering
, std::weak_ordering
or std::strong_ordering
) and performs a lexicographical comparison of the sequence elements returning that category -- that is, the same as std::lexicographical_compare_three_way
.
First of all, this is inconsistent. We shouldn't have one algorithm with a completely different interface to all the others.
A simple solution would be to change our compare
to be the equivalent of std::lexicographical_compare
-- that is, require just a "less than" predicate like everything else.
A potentially better solution would be to consistently use three way comparison everywhere -- specifically, requiring a comparator for all of the above algorithms (except compare
) to return at least std::weak_ordering
. After all, we're a C++20 library, we can do things the modern way!
As with everything though, there are pros and cons to doing this:
Pros:
- Correctness. At the moment, strict weak ordering is basically just a semantic requirement. Although a user could theoretically supply a custom comparator that returns
std::weak_ordering
but "lies" about it, this seems much less likely than accidentally passing an incorrect predicate. - Proper handling of NaNs: Right now,
flux::sort(vector<float>)
compiles but does the wrong thing if the data contains NaN values. With the proposed change (assumingstd::compare_three_way
as the default comparator) then this will fail to compile. This forces the user to think about how they want to handle NaNs; either by passingstd::strong_order
to use the IEEE total ordering, or by using a custom comparator that ignores NaN values.
Cons:
- By default, only types with a spaceship operator could be used with the Flux algorithms. Of course, defaulting the operator is very easy in C++20, but there's probably a lot of code out there that only defines the classic relational operators
- Potentially, worse codegen. These algorithms only really care about less-than, but custom comparators (or
op<=>
implementations) might need to do two compares rather than one. It's probably worth benchmarking this. In particular, usingstd::strong_order
to compare floats generates a lot more code thanstd::less
. - Ergonomics: right now it's easy to sort in descending order by saying
sort(vec, std::greater{})
. We'd probably want to provide a comparator that invertsordering::greater
andordering::less
to allow the same thing without the user needing to write a lambda to do it themselves. - Unfamiliarity: the STL has used less-than predicate comparators for 30 years, and now suddenly we're asking people to do something different. Likewise, "why can't I use
flux::sort
on my vector of doubles,std::sort
works fine"
Overall, I think this is worth investigating -- at least, benchmarking what happens if we use our current sort
with a comparator defined as (roughly):
bool cmp(T& lhs, T& rhs) { return std::is_lt(std::weak_order(lhs, rhs)); }
and seeing what happens.