compile and use argsort/argselect with avx512 by g++9/10 is not safe.
Closed this issue · 1 comments
It might not be a bug, but it's better to be a warning for user.
Description
g++-9
with -O2
above and avx512 enabled will compile the argsort_n_vec
(of x86-simd-sort) with some mmx instructions (which use registers MM0~MM7
are ususally alias to x87 ST(0) ~ ST(7)), but won't clear x87 fpu state.
So operations on long double
, which used x87 instructions default, will perform different result (NAN as mainly).
The worse thing is that long double
is used by prime_hashtable_policy
which is the default backend of unorderd_map
in g++-9
.
If we call unordered_map::rehash
after argsort/argselect
(of x86-simd-sort
), rehash(size_t(-1))
was performed, and std::bad_alloc
was raised.
Unfortunately g++-9 is end of life. But if someone can upgrade g++ to newer, this problem will disappered.
Workaround
Any one below can work, but also with their own side effect.
- compile with
-mno-mmx
- make long double not to use x87 with
-mlong-double-64
or-mlong-double-128
POC
Environment
- os: ubuntu 18.04/20.04 with gcc-9.4 or ubuntu 22.04/24.04 with gcc-9.5 or gcc 9.5 docker offical image (it shall not be an os case.)
- cpu: Intel 13700H with sde, Intel 11700B, AMD 7950x, Intel Gold Xeon 6145. ( it shall not be a hardware case.)
Code
compile below code with
g++-9 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc
// filename: poc.cc
#include <unordered_map>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{0};
auto arg = x86simdsortStatic::argselect(ptr, 0, size);
#ifdef PATCH
asm volatile ("FNINIT"); // clear x87 fpu state.
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main() {
std::unordered_map<size_t, size_t> amap;
amap.clear();
calc(65); // [65, 256]
amap.rehash(1); // <- crash here
return 0;
}
screenshot
simple patch
compile the code with below, and no std::bad_alloc was thrown.
g++-9 -std=c++17 -Wall -O2 -DPATCH -march=skylake-avx512 -o poc poc.cc
gdb script to check x87 fpu is used and the state is not cleared
the ST(0)
value should be 1
.
gdb \
-ex "b *$(objdump -d --prefix-address poc | grep fildll | awk '{$2 = substr($2, 2, length($2)-2); print $2}')" \
-ex 'r' \
-ex 'print "ST(0) before do size_t to long double convert" ' \
-ex 'print $st0' \
-ex 'print "The size_t value to be converted" ' \
-ex 'print *(size_t *)($rsp+0x8)' \
-ex 'print "ST(0) after do size_t to long double convert" ' \
-ex 'si' \
-ex 'print $st0' \
--batch --silent ./poc | grep '\$'
screenshot without patch
the ST(0)
value is -NAN
which means there something wrong with x87 fpu.
screenshot after patch
the ST(0)
value is 1
as it expected to be.
Further reading, what code is genered by g++9, and where is it in argsort.
source code in argsort_n_vec
x86-simd-sort/src/xss-network-keyvaluesort.hpp
Lines 501 to 506 in 2315766
mmx instructions g++-9 generated.
1bad: 0f ef ed pxor %mm5,%mm5
1bc9: 0f ef f6 pxor %mm6,%mm6
1c09: 0f ef e4 pxor %mm4,%mm4
1c41: 0f ef db pxor %mm3,%mm3
1c71: 0f ef ff pxor %mm7,%mm7
1c85: 0f ef c9 pxor %mm1,%mm1
1c98: 0f ef c0 pxor %mm0,%mm0
1ccf: 0f ef d2 pxor %mm2,%mm2
db88: 0f ef c0 pxor %mm0,%mm0
db9b: 0f ef db pxor %mm3,%mm3
dbae: 0f ef d2 pxor %mm2,%mm2
dbc0: 0f ef c9 pxor %mm1,%mm1
Another poc without unordered_map
#include <cstdio>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);
#ifdef PATCH
asm volatile ("FNINIT");
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main() {
double ret = calc(65); // [65, 256]
size_t a = 1ull;
double ret_1 = a / ret;
long double lret_1 = a / (long double)ret;
printf("ret=%lf, 1/ret=%lf, 1/(long double)ret=%Lf\n", ret, ret_1, lret_1);
return std::isnan(lret_1);
}
screenshot
screenshot after patch
The code to check that g++-10 is not safe either.
compile g++-10 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc
and run ./poc 10
#include <cstdio>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);
#ifdef PATCH
// asm volatile ("FNINIT");
asm volatile ("EMMS");
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main(int argc, const char * argv[]) {
double ret = calc(256); // [65, 256]
size_t a = std::atoll(argv[1]);
double ret_a = a / ret;
long double lret_a = a / (long double)ret;
printf("ret=%lf, %lu/ret=%lf, %lu/(long double)ret=%Lf\n", ret, a, ret_a, a, lret_a);
return std::isnan(lret_a);
}
thanks for letting us know. Probably good to a add note in the README section to build using -mno-mmx
flags when using g++ v9. Would you like to add a PR with that note?