Another AVX2 version
simonlindholm opened this issue · 4 comments
In case you're interested: https://gist.github.com/simonlindholm/7f15e1c4d61813eeaa99ff71f1ada1a0
In my tests (random data, n=1e7) it runs 2.5 times faster than avx2-quicksort.cpp
, albeit at the cost of allocating external memory. (Apologies for the unreadableness, it's written in relation to a competitive programming course.)
@simonlindholm That's cool, congratulations! Did you test it against ascending, descending data and a small number of distinct values? We learnt (@lemire and I) that for these special cases vectorized algorithms perform worse than scalar ones.
These are the sort of times I see:
Input | std::sort | mine | avx2-quicksort.cpp |
---|---|---|---|
random | 0m20.209s | 0m7.631s | 0m19.476s |
increasing | 0m4.548s | 0m4.509s | 0m2.937s |
decreasing | 0m3.633s | 0m4.550s | 0m3.843s |
rand()%10 | 0m6.654s | 0m1.431s | >10 min |
i%(N/10) | 0m10.227s | 0m4.275s | 0m13.742s |
(For n = 1e7, sums of 20 sortings, on an i5-4200U, compiled with g++-5.4 -std=c++11 -O3 -march=native
.)
@simonlindholm These are interesting results.
@simonlindholm Great results!