numpy/x86-simd-sort

Improve argsort for 32-bit

Opened this issue · 0 comments

32-bit argsort uses ymm registers: we can switch to zmm registers (use 2x i64gather instructions) and add new bitonic networks.