How Intel simd sort implements Argsort with different inputs?
Closed this issue · 2 comments
A big thanks to the Intel team for providing such an optimized sorting algorithm with AVX2 and AVX512 versions. I'm new to Intel's SIMD sort source code and trying to understand how argsort is implemented using AVX2. I have a couple of questions:
- How are the indices swapped along with the input array during sorting?
- Typically, the indices array is of type size_t, while the input array can have various data types (e.g., float16, int16, float32, float64). How are an equal number of elements and indices efficiently loaded into SIMD vectors?
- Why hasn’t an SSE-based argsort implementation been developed? Are there any limitations to implementing argsort with SSE?
Thanks in advance!
Hello!,
I'll go over your questions in order:
1.
The argsort/argselect function don't modify the input array, only the indices are modified.
When doing the partitions/small sorts, a gather is used to load values based on the indices currently being processed.
Then the loaded index/value pairs are sorted together (a small key-value sort essentially), but then only the indices are stored back into memory.
We use the method of having half-length vectors when one type is smaller than the other; so when using AVX512, with 64-bit indices but 32-bit values, the indices are loaded as 8 8-byte indices into a ZMM register, and the values are loaded as 8 4-byte values into a YMM register.
If that same scenario was using AVX2, it would use YMM registers for the indices and XMM registers for the values
(Also argsort/argselect don't currently support 16 bit types, that was a documentation error)
While admittedly I haven't fully looked into the possibility for SSE based argsort, that previous point poses a problem for our argsort method; there are no half length vectors for SSE.
I don't get the sense it would perform well/better than well written scalar sorts, though I don't know that for sure
Please let me know if you have any other questions!
I hope that answers your question @zafeerali943. Closing this for now, feel free to reach out in case you have more questions.