numpy_quaddtype smallest_subnormal is wrong
Closed this issue · 5 comments
numpy_quaddtype.get_sleef_constant("smallest_subnormal")returns QuadPrecision('3.362103143112093506262677817321753e-4932', backend='sleef'), which does not match the documented 2^-16494 from https://sleef.org/quad.xhtml.
The implementation here
numpy-user-dtypes/quaddtype/numpy_quaddtype/src/quaddtype_main.c
Lines 73 to 75 in 47c9409
looks correct, is the macro defined in the wrong way?
In any case, we should fix this and add tests for all exposed constant values.
@SwayamInSync Do you have an idea?
I'm on Mac where I get the wrong answer ...
I see, alright I'll check the sleef code for its implementation or maybe I am guessing any compiler flag is optimizing it out.
Okay so I found the issue, it is basically the way sleef is handling the cross-platform support.
On the devices like linux where __float128 is defined sleef directly uses the macro
typedef __float128 Sleef_quad;
#define SLEEF_QUAD_C(x) (x ## Q)
#define SLEEF_QUAD_DENORM_MIN SLEEF_QUAD_C(+0x0.0000000000000000000000000001p-16382)When this number converted from hexadecimal to float gets the value of 2^(-16494) (SLEEF_QUAD_DENORM_MIN)
For Mac and Windows where __float128 is not supported by default, it uses
static inline Sleef_quad sleef_q(int64_t H, uint64_t L, int E)
{
...
c.h = (((int64_t)(H) < 0 ? 1ULL : 0ULL) << 63) |
((0x7fff & (uint64_t)((E) + 16383)) << 48) |
((int64_t)(H) < 0 ? -(int64_t)(H) : (int64_t)(H) & 0xffffffffffffULL);
c.l = (uint64_t)(L);
Sleef_quad q;
memcpy(&q, &c, 16);
return q;
}
#endif
#define SLEEF_QUAD_DENORM_MIN sleef_q(+0x0000000000000LL, 0x0000000000000001ULL, -16382)The issue is in the way sleef_q is defined it is assuming exponent E is the biased exponent minus 16383
For normal numbers, the actual IEEE 754 exponent field is E + 16383
But it doesn't handle subnormal numbers correctly
So SLEEF_QUAD_DENORM_MIN is the only edge case here it seems as the fundamental problem is that sleef_q was designed for normalized numbers, where the high mantissa part (H) represents the implicit leading 1 plus the fractional part
Since we can't directly fix the sleef_q (without forking the actual sleef and using it instead of official) I will do a patch directly in our codebase @juntyr
