Merge sort and AVX512

Question

Merge sort and AVX512

Opened this issue 8 months ago · 3 comments

Hi!

Many thanks for your contribution to speeding up sorting in numpy. Wanted to ask if there are any plans to speed up merge/tim sort with AVX2/AVX512?

Answer 1 · 2023-11-09T21:02:48.000Z

Hi @sergii-mamedov that isn't in my to do list (at least not yet) unless there is a good enough justification to work on those. Is there a reason why merge/tim sort will be preferred over quicksort?

Answer 2 · 2023-11-09T22:10:37.000Z

Hi @r-devulap
I was supported the METASPACE project. Sorting is one of the most time-consuming steps. We have from a hundred to ten thousand already sorted arrays (m/z spectra), which we need to combine into one array and sort. The total size of arrays ranges from hundreds of megabytes to hundreds of gigabytes.
I tested quicksort vs mergesort (in numpy terminology) two years ago - mergesort was 5-10 times faster. Last weekend I tested the latest release of numpy - mergesort is still about twice as fast on my data.
Also, a feature of merge/tim sort is that it is a stable variant of sorting, that is, the order of elements is determined. This is important for this task.

Answer 3 · 2023-11-09T23:10:14.000Z

curious as to what data type and distribution you have in your application. On NumPy benchmarks, quicksort is slower than merge sort only for ordered and reverse arrays:

             quick   float64        ('random',)         37.9±0.02μs
              quick   float64        ('ordered',)         37.3±0.4μs
              quick   float64       ('reversed',)         37.8±0.2μs
              quick   float64        ('uniform',)        5.44±0.04μs
              quick   float64    ('sorted_block', 10)    36.2±0.08μs
              quick   float64   ('sorted_block', 100)    39.7±0.05μs
              quick   float64   ('sorted_block', 1000)    43.5±0.1μs
              merge   float64        ('random',)          609±0.7μs
              merge   float64        ('ordered',)        18.3±0.03μs
              merge   float64       ('reversed',)        12.7±0.04μs
              merge   float64        ('uniform',)        18.3±0.03μs
              merge   float64    ('sorted_block', 10)      169±1μs
              merge   float64   ('sorted_block', 100)     123±0.8μs
              merge   float64   ('sorted_block', 1000)   68.2±0.06μs