Merge sort and AVX512
Opened this issue · 3 comments
Hi!
Many thanks for your contribution to speeding up sorting in numpy. Wanted to ask if there are any plans to speed up merge/tim sort with AVX2/AVX512?
Hi @sergii-mamedov that isn't in my to do list (at least not yet) unless there is a good enough justification to work on those. Is there a reason why merge/tim sort will be preferred over quicksort?
Hi @r-devulap
I was supported the METASPACE project. Sorting is one of the most time-consuming steps. We have from a hundred to ten thousand already sorted arrays (m/z spectra), which we need to combine into one array and sort. The total size of arrays ranges from hundreds of megabytes to hundreds of gigabytes.
I tested quicksort vs mergesort (in numpy terminology) two years ago - mergesort was 5-10 times faster. Last weekend I tested the latest release of numpy - mergesort is still about twice as fast on my data.
Also, a feature of merge/tim sort is that it is a stable variant of sorting, that is, the order of elements is determined. This is important for this task.
curious as to what data type and distribution you have in your application. On NumPy benchmarks, quicksort is slower than merge sort only for ordered and reverse arrays:
quick float64 ('random',) 37.9±0.02μs
quick float64 ('ordered',) 37.3±0.4μs
quick float64 ('reversed',) 37.8±0.2μs
quick float64 ('uniform',) 5.44±0.04μs
quick float64 ('sorted_block', 10) 36.2±0.08μs
quick float64 ('sorted_block', 100) 39.7±0.05μs
quick float64 ('sorted_block', 1000) 43.5±0.1μs
merge float64 ('random',) 609±0.7μs
merge float64 ('ordered',) 18.3±0.03μs
merge float64 ('reversed',) 12.7±0.04μs
merge float64 ('uniform',) 18.3±0.03μs
merge float64 ('sorted_block', 10) 169±1μs
merge float64 ('sorted_block', 100) 123±0.8μs
merge float64 ('sorted_block', 1000) 68.2±0.06μs