tcbrindle/flux

Consistent comparisons

tcbrindle opened this issue · 0 comments

We have several adaptors and algorithms which take a comparator, and require that the comparator produces a strict weak order over the sequence elements:

  • find_min/find_max/find_minmax
  • min/max/minmax
  • sort
  • set_union/set_difference/set_symmetric_difference/set_intersection
  • cmp::min/cmp::max (for a sequence of two elements)

The comparator for all of these functions follows the classic STL model of requiring a binary predicate which returns true if the first argument is "less than" the second, defaulting to std::ranges::less. We use the standard library std::strict_weak_order concept, in most cases via our own strict_weak_order_for<Seq1, Seq2> concept which checks std::strict_weak_order for all combinations of the sequences' element_t, value_t& and common_element_t (the Flux equivalent of Ranges' std::indirect_strict_weak_order).

The outlier here is flux::compare(seq1, seq2, cmp) which requires the comparator to return a value of one of the three C++20 comparison categories (std::partial_ordering, std::weak_ordering or std::strong_ordering) and performs a lexicographical comparison of the sequence elements returning that category -- that is, the same as std::lexicographical_compare_three_way.

First of all, this is inconsistent. We shouldn't have one algorithm with a completely different interface to all the others.

A simple solution would be to change our compare to be the equivalent of std::lexicographical_compare -- that is, require just a "less than" predicate like everything else.

A potentially better solution would be to consistently use three way comparison everywhere -- specifically, requiring a comparator for all of the above algorithms (except compare) to return at least std::weak_ordering. After all, we're a C++20 library, we can do things the modern way!

As with everything though, there are pros and cons to doing this:

Pros:

  • Correctness. At the moment, strict weak ordering is basically just a semantic requirement. Although a user could theoretically supply a custom comparator that returns std::weak_ordering but "lies" about it, this seems much less likely than accidentally passing an incorrect predicate.
  • Proper handling of NaNs: Right now, flux::sort(vector<float>) compiles but does the wrong thing if the data contains NaN values. With the proposed change (assuming std::compare_three_way as the default comparator) then this will fail to compile. This forces the user to think about how they want to handle NaNs; either by passing std::strong_order to use the IEEE total ordering, or by using a custom comparator that ignores NaN values.

Cons:

  • By default, only types with a spaceship operator could be used with the Flux algorithms. Of course, defaulting the operator is very easy in C++20, but there's probably a lot of code out there that only defines the classic relational operators
  • Potentially, worse codegen. These algorithms only really care about less-than, but custom comparators (or op<=> implementations) might need to do two compares rather than one. It's probably worth benchmarking this. In particular, using std::strong_order to compare floats generates a lot more code than std::less.
  • Ergonomics: right now it's easy to sort in descending order by saying sort(vec, std::greater{}). We'd probably want to provide a comparator that inverts ordering::greater and ordering::less to allow the same thing without the user needing to write a lambda to do it themselves.
  • Unfamiliarity: the STL has used less-than predicate comparators for 30 years, and now suddenly we're asking people to do something different. Likewise, "why can't I use flux::sort on my vector of doubles, std::sort works fine"

Overall, I think this is worth investigating -- at least, benchmarking what happens if we use our current sort with a comparator defined as (roughly):

bool cmp(T& lhs, T& rhs) { return std::is_lt(std::weak_order(lhs, rhs)); }

and seeing what happens.