`sort`/`sortBy` Does Not Use The Correct Ordering & Violates Parametricity
eric-corumdigital opened this issue · 5 comments
All undefined elements are sorted to the end of the array, with no call to compareFunction.
Therefore, sorting on any type inhabited by undefined will not use the correct ordering (except by coincidence). This is also a parametricity violation.
The only truthful use of native sort is on JavaScript values.
I propose we replace the implementation with a correct one. Yes, it will be slower. We might define unsafeSort to use the native sort with the same typing as sort. Then we may also have a different sort definition typed on JavaScript values (which is Foreign).
I bear gifts:
function mergeFromTo(compare, ordToNum, xs1, xs2, from, to) {
var mid;
var i;
var j;
var k;
var x;
var y;
var c;
mid = from + ((to - from) >> 1);
if (mid - from > 1) mergeFromTo(compare, ordToNum, xs2, xs1, from, mid);
if (to - mid > 1) mergeFromTo(compare, ordToNum, xs2, xs1, mid, to);
i = from;
j = mid;
k = from;
while (i < mid && j < to) {
x = xs2[i];
y = xs2[j];
c = ordToNum(compare(x, y));
if (c > 0) {
xs1[k++] = y;
++j;
}
else if (c === 0) {
xs1[k++] = x;
xs1[k++] = y;
++i;
++j;
}
else {
xs1[k++] = x;
++i;
}
}
while (i < mid) {
xs1[k++] = xs2[i++];
}
while (j < to) {
xs1[k++] = xs2[j++];
}
}
function mergeSort(compare, ordToNum, xs) {
var out;
if (xs.length < 2) return xs;
out = xs.slice(0);
mergeFromTo(compare, ordToNum, out, xs.slice(0), 0, xs.length);
return out;
};
foreign import mergeSort ∷ ∀ a. Fn3 (Fn2 a a Ordering) (Ordering → Number) (Array a) (Array a)
Duplicates the input array twice. One copy is garbage and the other becomes the sorted array.
For number arrays, this is ~3.5x faster than V8's sort and ~0.9x as fast as SpiderMonkey's sort. Tested on random number arrays of magnitudes from 1 to 10,000,000. Nothing seemed to vary much with duplicates.
For non-number arrays, this is ~2x faster than V8's sort and ~0.2x as fast as SpiderMonkey's sort.
Special-casing small arrays on insertion sort caused a dramatic slowdown. I tried small sizes from 3 to 16 elements and it was always slower. This merge sort is already trivial (no recursion) on 2 elements or less.
Why not Fn2 a a Number? Well because I am avoiding allocating a closure to compose compare and ordToNum.
This does indeed seem like something we ought to fix. Your implementation seems fine to me too. Does anyone else have any thoughts?
Yeah, I think this is important to fix. I don't really like that it's such a slow down for SpiderMonkey, but it seems like a simple enough implementation that can be improved at some later point.
I don't understand the issue here, isn't sort constrained by Ord type class so the user will always have to provide a compare to do valid sorting?
Any kind of example to help understand this?
If you have a type that is inhabited by null/undefined at runtime, it won’t necessarily respect your Ord instance, since the JS sort algorithm treats those specially.