purescript/purescript-arrays

`sort`/`sortBy` Does Not Use The Correct Ordering & Violates Parametricity

eric-corumdigital opened this issue · 5 comments

All undefined elements are sorted to the end of the array, with no call to compareFunction.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description

Therefore, sorting on any type inhabited by undefined will not use the correct ordering (except by coincidence). This is also a parametricity violation.

The only truthful use of native sort is on JavaScript values.

I propose we replace the implementation with a correct one. Yes, it will be slower. We might define unsafeSort to use the native sort with the same typing as sort. Then we may also have a different sort definition typed on JavaScript values (which is Foreign).

I bear gifts:

function mergeFromTo(compare, ordToNum, xs1, xs2, from, to) {

  var mid;
  var i;
  var j;
  var k;
  var x;
  var y;
  var c;

  mid = from + ((to - from) >> 1);
  if (mid - from > 1) mergeFromTo(compare, ordToNum, xs2, xs1, from, mid);
  if (to - mid > 1) mergeFromTo(compare, ordToNum, xs2, xs1, mid, to);
  
  i = from;
  j = mid;
  k = from;
  while (i < mid && j < to) {
    x = xs2[i];
    y = xs2[j];
    c = ordToNum(compare(x, y));
    if (c > 0) {
      xs1[k++] = y;
      ++j;
    }
    else if (c === 0) {
      xs1[k++] = x;
      xs1[k++] = y;
      ++i;
      ++j;
    }
    else {
      xs1[k++] = x;
      ++i;
    }
  }
  while (i < mid) {
    xs1[k++] = xs2[i++];
  }
  while (j < to) {
    xs1[k++] = xs2[j++];
  }
  
}

function mergeSort(compare, ordToNum, xs) {

  var out;

  if (xs.length < 2) return xs;
  
  out = xs.slice(0);
  mergeFromTo(compare, ordToNum, out, xs.slice(0), 0, xs.length);

  return out;
};
foreign import mergeSort ∷ ∀ a. Fn3 (Fn2 a a Ordering) (Ordering → Number) (Array a) (Array a)

Duplicates the input array twice. One copy is garbage and the other becomes the sorted array.

For number arrays, this is ~3.5x faster than V8's sort and ~0.9x as fast as SpiderMonkey's sort. Tested on random number arrays of magnitudes from 1 to 10,000,000. Nothing seemed to vary much with duplicates.

For non-number arrays, this is ~2x faster than V8's sort and ~0.2x as fast as SpiderMonkey's sort.

Special-casing small arrays on insertion sort caused a dramatic slowdown. I tried small sizes from 3 to 16 elements and it was always slower. This merge sort is already trivial (no recursion) on 2 elements or less.

Why not Fn2 a a Number? Well because I am avoiding allocating a closure to compose compare and ordToNum.

This does indeed seem like something we ought to fix. Your implementation seems fine to me too. Does anyone else have any thoughts?

Yeah, I think this is important to fix. I don't really like that it's such a slow down for SpiderMonkey, but it seems like a simple enough implementation that can be improved at some later point.

I don't understand the issue here, isn't sort constrained by Ord type class so the user will always have to provide a compare to do valid sorting?

Any kind of example to help understand this?

If you have a type that is inhabited by null/undefined at runtime, it won’t necessarily respect your Ord instance, since the JS sort algorithm treats those specially.