Optimizing Numpy's quicksort by simd sort.
Qiyu8 opened this issue · 4 comments
Hi, Thanks for your awesome work, The accelerate of quick sort is of great value in many areas, As a member of Numpy, I want to import simd sort algorithm here to Numpy by using universal intrinsics, as far as I can see, avx2_quicksort.cpp
is the best candidate to integrated, @WojciechMula @lemire Any suggestions would be appreciated!
Can you elaborate on what you want to sort ? The data type is important.
I'm afraid the algorithms in this repository are not in a ready-to-use state. The performance depends on data distribution, IIRC sorting large arrays of values from a tiny set (like 10k elements but only 7 different values) is slow.
@lemire The following data type is sorted in Numpy:
Numpy type | C type | Description |
---|---|---|
np.bool_ | bool | Boolean (True or False) stored as a byte |
np.byte | signed char | Platform-defined |
np.ubyte | unsigned char | Platform-defined |
np.short | short | Platform-defined |
np.ushort | unsigned short | Platform-defined |
np.intc | int | Platform-defined |
np.uintc | unsigned int | Platform-defined |
np.int_ | long | Platform-defined |
np.uint | unsigned long | Platform-defined |
np.longlong | long long | Platform-defined |
np.ulonglong | unsigned long long | Platform-defined |
np.half / np.float16 | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa | |
np.single | float | Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa |
np.double | double | Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa. |
np.longdouble | long double | Platform-defined extended-precision float |
np.csingle | float complex | Complex number, represented by two single-precision floats (real and imaginary components) |
np.cdouble | double complex | Complex number, represented by two double-precision floats (real and imaginary components). |
np.clongdouble | long double complex | Complex number, represented by two extended-precision floats (real and imaginary components). |
but I can set an allow-list to decide which data type is suitable for sorting, currently the simd sort is prepared for uint
if I'm not mistaken.
@WojciechMula Thanks for your advice, there is no perfect algorithm in the world, quicksort is usually the fastest, but the worst case scenario is O(N^2)
so in Numpy It's switches to the O(NlogN)
worst case heapsort if not enough progress is made on the large side of the two quicksort partitions. This improves the worst case while still retaining the speed of quicksort for the common case, on the other hand, Numpy uses insert sorting for small arrays. Sorting large arrays of values from a tiny set is a special scenario that may handled separately, Do you have any plan to improve simd sort code to make it more practical? 😄
@Qiyu8 We don't actively work on this topic, it somehow faded out. But I wish we could do something. :)
BTW, you may find this project interesting: https://github.com/damageboy/vxsort-cpp