d3/d3-array

sort(A) and sort(A, d => d) are not equivalent

Fil opened this issue · 5 comments

Fil commented

in the way they handle non-comparable values:

const A = [1, undefined, -1];
d3.sort(A); // [-1, 1, undefined]
d3.sort(A, d => d); // [1, undefined, -1]

not sure if this should be considered a bug or not, but it can be surprising. The basic caveat is that having an undefined value (and after #210, a null), will throw a wrench into sorting—maybe it's something to add to the documentation instead.

Definitely a bug.

I’m also wondering now if ascending and descending should consider all non-comparable values as after comparable values, rather than having undefined behavior.

Fil commented

If we change ascending to:

function ascending(a, b) {
  if (a !== a) a = null; // NaN to null
  if (b !== b) b = null;
  return a == null ? b == null ? NaN : 1
    : b == null ? -1
    : a < b ? -1
    : a > b ? 1
    : a >= b ? 0
    : NaN;
}

there are consequences on greatest and greatestIndex (which must not select the undefined, so should use ascendingDefined as default comparator).

I was thinking that we should introducing d3.ascendingDefined and d3.descendingDefined (both of which put undefined values at the end, i.e., as the greatest values), and have d3.sort default to using d3.ascendingDefined instead of d3.ascending. I think there are valid use cases for both. We’d presumably use Plot’s implementation, unless we think it needs changing?

https://github.com/observablehq/plot/blob/ccb9c9a60700dd38096eca6b6ae658f8af0e038a/src/defined.js#L3-L13

(One question is that Number.isNaN will detect NaN directly, but won’t detect an object that coerces to NaN, such as an invalid Date. So maybe we should do a self comparison test !(x >= x) instead?)

In the accessor case (sort(objects, d => d.foo)) and if we consider the identity case equivalent (sort(values, d => d)), then I think we can change d3.sort to move null, NaN and undefined values to the end of the array as a generalization of array.sort’s behavior.

However, I’m not sure how to do that in the comparator case (sort(objects, (a, b) => a.foo - b.foo)) since the comparator doesn’t give us enough information: it doesn’t tell us which of a or b is undefined, so we don’t know which one to move to the end. I suppose there might be some way we could try to infer this information, but I think maybe we leave the behavior of d3.sort as-is in the comparator case, and focus on fixing the accessor and one-argument identity case.

compare(x, x) !== 0 is enough to tell us that we should move x to the end. But this requires a pass before sorting.