greater method of simd structs for unsigned integral types is wrong when comparing to 0
Closed this issue · 4 comments
For unsigned integral types, the method greater
of simd struct return wrong results when comparing to 0.
For example, the following code, when compiled on a machine supporting AVX2 returns a wrong result
#include "fflas-ffpack/fflas-ffpack-config.h"
#include "fflas-ffpack/fflas-ffpack.h"
using S = Simd256<uint64_t>;
using scalar_t = S::scalar_t;
void
pprint (const typename S::vect_t v, const char *varname)
{
scalar_t array[S::vect_size];
S::storeu (array, v);
printf ("%8s =", varname);
for (unsigned int i = 0; i < S::vect_size;i++)
printf (" 0x%016lx", array[i]);
printf ("\n");
}
int
main (int argc, char *argv[])
{
typename S::vect_t a, b, t;
a = S::set (0, 1, 0x8000000000000001, 0x7fffffffffffffff);
b = S::set1 (0);
pprint (a, "a");
pprint (b, "b");
t = S::greater (a, b);
pprint (t, "a > b ?");
return 0;
}
It outputs
a = 0x0000000000000000 0x0000000000000001 0x8000000000000001 0x7fffffffffffffff
b = 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
a > b ? = 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
instead of
a = 0x0000000000000000 0x0000000000000001 0x8000000000000001 0x7fffffffffffffff
b = 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
a > b ? = 0x0000000000000000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
The problem comes from the implementation of greater
in simd*_int*.inl
files. For unsigned type (for which an intrisinc does not exist), it is implemented as
static INLINE CONST vect_t greater(vect_t a, vect_t b) {
vect_t x;
x = set1(-(static_cast<scalar_t>(1) << (sizeof(scalar_t) * 8 - 1)));
a = sub(x, a);
b = sub(x, b);
return _mm256_cmpgt_epi64(b,a);
}
For that exact example, changing the function to the (more reliable?) variant
static INLINE CONST vect_t greater(vect_t a, vect_t b) {
vect_t x;
x = set1((static_cast<scalar_t>(1) << (sizeof(scalar_t) * 8 - 1)));
a = vxor(x, a);
b = vxor(x, b);
return _mm256_cmpgt_epi64(a,b);
}
fixes the problem
(By the way, does it really make sense not to have set1(1ULL << 63)
since there is a hard-coded _mm256_cmpgt_epi64
anyways?)
Thanks ! I will create a PR with this change for all simd files.
Welp, I hope it does not introduce other bugs tho; I haven't tested it. (But I found it in the Hacker's delight, so it's not something out of my hat)
Welp, I hope it does not introduce other bugs tho; I haven't tested it. (But I found it in the Hacker's delight, so it's not something out of my hat)
I added tests for the special case of the comparison with zero (they were already tests for the generic case)