Consecutive Calls to keyvalue_partial_sort with the Same Key and Value Vectors Produce Incorrect Results
Closed this issue · 2 comments
Calling keyvalue_partial_sort consecutively twice on the same key vector and value vector leads to incorrect results. The detailed program is as follows. If keyvalue_partial_sort(scores.data(), ids.data(), k, num, false, true) is called twice consecutively, memcmp fails, triggering an assert.
#include <random>
#include <iostream>
#include <cstring>
#include <cassert>
#include "x86simdsort.h"
int main() {
size_t num = 200'000;
size_t k = 1'000;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<float> dis;
std::vector<float> scores;
std::vector<uint64_t> ids;
for (size_t i = 0; i < num; ++i) {
scores.push_back(dis(gen));
ids.push_back(i);
}
std::vector<float> scores_bckp = scores;
std::vector<uint64_t> ids_bckp = ids;
x86simdsort::keyvalue_partial_sort(scores.data(), ids.data(), k, num, false, true);
x86simdsort::keyvalue_partial_sort(scores.data(), ids.data(), k, num, false, true);
x86simdsort::keyvalue_partial_sort(scores_bckp.data(), ids_bckp.data(), k, num, false, true);
if (memcmp(ids.data(), ids_bckp.data(), k*sizeof(float)) != 0) {
std::cout << "not match " << ids[0] << " " << ids_bckp[0] << std::endl;
assert(false);
}
return 0;
}
compile command:
g++ test.cc -I./lib -I./src builddir/libx86simdsortcpp.a -std=c++20 -O3 -mavx2 -fopenmp
x86-simd-sort version: v6.0
Looking forward to your insights on this issue.
Thank you!
Hello,
As far as I can tell, this is expected behavior. All of the sorting algorithms in x86-simd-sort, including keyvalue_partial_sort, are unstable sorts. This means that keyvalue_partial_sort doesn't guarantee the ordering of values that have identical keys. From my experiments with the test case above, every run seems to have a few repeat keys, which allows for the different ordering of values. From my testing, the only difference between the results are in the ordering of values with repeated keys.
@77009833 I think #177 (comment) answers your question. Closing this.