dart-lang/collection

mergeSort fails when static type of a list does not match a runtime type

iinozemtsev opened this issue · 10 comments

https://dartpad.dev/?id=f9535fca683721dd5eb4416e364cc5df

import 'package:collection/collection.dart';
import 'dart:math';

void main() {
  final rnd = Random(9092);
  final ints = List.generate(1000, (i) => rnd.nextInt(10000));
  mergeSort<num>(ints);
  print(ints);
}

scratchSpace is created with a function type argument, and then setRange fails, because runtime element types don't match.

@natebosch @devoncarew could you please take a look?

Looks like mergeSort uses a scratchSpace which is a List<E> (type argument taken from the type argument of mergeSort, often by inference or manually also the statically known type argument of the list-being-sorted).

I haven't tried it in detail, but it looks like we'd simply need to ensure that the scratchSpace has the same type argument as the elements:

  // `src/algorithms.dart`, line 235:
  var scratchSpace = List<E>.filled(secondLength, elements[start]);

  // Could be the following, except that this would be too slow:
  var scratchSpace = elements.take(0).toList();
  var startElement = elements[start];
  for (int i = 0; i < secondLength; ++i) scratchSpace.add(startElement);

(I'm commenting on this because it's yet another example of a situation where dynamically checked covariance causes a run-time type error. So I added a reference to this issue to dart-lang/language#524.)

yanok commented
var scratchSpace = elements.take(secondLength).map((_) => elements[start]).toList()

maybe?

Hehe, who knows, but I suspect that the actual type argument passed to map will be the statically known element type of elements, in which case we won't get a list with the same actual type argument as elements after all. ;-)

void main() {
  List<num> elements = <int>[1, 2, 3];
  var secondLength = 2;
  var start = 0;
  var scratchSpace = elements.take(secondLength).map((_) => elements[start]).toList();
  print(scratchSpace.runtimeType); // 'JSArray<num>' in DartPad, should've been 'JSArray<int>'.
}

Looks like mergeSort uses a scratchSpace which is a List<E> (type argument taken from the type argument of mergeSort, often by inference or manually also the statically known type argument of the list-being-sorted).

I haven't tried it in detail, but it looks like we'd simply need to ensure that the scratchSpace has the same type argument as the elements:

  // `src/algorithms.dart`, line 235:
  var scratchSpace = List<E>.filled(secondLength, elements[start]);

  // Could be the following, except that this would be too slow:
  var scratchSpace = elements.take(0).toList();
  var startElement = elements[start];
  for (int i = 0; i < secondLength; ++i) scratchSpace.add(startElement);

There are problems to using toList like this:

  • Merge sort copies elements from A to B and from B to A, so invariance is required. toList does not have to return a list with exactly the right element type, just a subtype.
  • It leaves the efficiency of mergeSort at the mercy of the implementation type of .take(0).toList(). In the original codeList.filled can be reasonably expected to return a List that is a 'fixed length system list' with fast indexing. Who knows what implementation toList returns? Even if it is of uniform implementation type, can the optimizing compilers figure it out? If not, all the indexing is done via dispatched calls.

I think the type error in _merge can be repaired by writing a copy loop instead of setRange. In effect, the parametric covariance check happens for each element rather than the collection.

Heretical opinion: mergeSort should always copy the original List into a scratch space that is 1.5x larger than the original [start, end) slice, perform all the sorting in the scratch space, and copy back at the end (if there was any actual change). This way, the inner loops of the algorithm can be guaranteed to be operating of a fixed length system list to avoid the dispatched call, and the code can be written so that the compiler can see that the list assignments are all essentially a[i] = a[j] and avoid the parametric covariant checks by local invariance reasoning. Does this matter enough with the keyOf and compare calls? Maybe not.

We have a package which could solve this, but it cheats and uses capabilities the language normally doesn't have.
https://pub.dev/documentation/dart_internal/latest/extract_type_arguments/extractIterableTypeArgument.html

I don't think we want to spread usage of that package unless we are fairly confident that we are going to add a language level capability to replace it eventually, such as type patterns. dart-lang/language#170

@rakudrama wrote:

... invariance is required. toList does not have to return a list with exactly the right element type, just a subtype.

Right, the member signature and specified behavior of toList in Iterable<E> allows it to return a List<S> for any S <: E that allows the list to contain the given elements. The implementation shown here will return a List<E>, but any subtype could override that. Also the DartDoc is very vague on the value of the type argument.

So we don't have (and inherently can't have) any tool which is guaranteed to yield a fresh list whose run-time type implements List<E> from a given List<E>. In particular, we don't have and can't write a cloneFilled method, unless we'd edit the class List (which would break all non-subclass subtypes).

In any case, we could provide something like extractIterableTypeArgument, and I was just making the point that we'd need invariance (dynamically: from a list whose run-time type implements List<E> we want a fresh list whose run-time type implements List<E>, not just some subtype or supertype of that, and not just something based on the static types), and take(0).toList() will do that in practice. And of course that isn't good enough for SDK code. ;-)

and inherently can't have

Would type patterns enable this?

Would type patterns enable this?

Yes. Type patterns is a meta-proposal in the sense that the basic idea can be applied to static types as well as run-time types, in various ways. One typical example would be that we use type patterns to bind a type variable to the run-time value of a type parameter at a given type, and that particular feature would support the following:

void main() {
  List<num> xs = <int>[1];
  if (xs is List<final X>) { // `X` is in scope here, the value is `int`.
    var newList = List<X>.filled(someLength, xs.first);
  }
}
lrhn commented

I'd trust .toList() to do the right thing here.

The Iterable.toList is under-specified, but it should return a list with exactly the same element type as the original iterable. It's a mutable list, so the client can expect to set or add more elements. Returning a list of a subtype of the actual element type isn't safe for that, so it shouldn't ever happen.

And we should avoid anything that is generic in the return type, since as Erik says, then the return type will depend on the static type, not the runtime element type of the receiver, which is the only type known to be correct.

The most direct approach is .sublist(0, secondLength), which is specified as returning a list of the same kind as the original. This includes returning a similar typed-data list if called on a typed-data list. I'd probably use that over having an intermediate iterable.

If that can't work for some reason, I expect .take(secondLength).toList(growable: false) or .getRange(0, secondLength).toList(growable: false) will be more efficient than .take(0).toList() followed by some way of growing it to the correct length.
The copying will probably be wasted, since we already have the elements in that order, but we need to be writing something into a list to fill it out with null safety.
(An efficient ListTakeIterable.toList() could use a mem-copy to initialize a newly allocated list, but it probably doesn't.)