Unpacking float32 does not preserve signaling NaN, unlike float64
fumoboy007 opened this issue · 10 comments
Describe the bug
float32/float64 NaNs have a quiet vs. signaling state. This state is encoded using a couple bits in the significand.
Unpacking a float64 signaling NaN using msgpack-c 6.0.0 preserves the quiet vs. signaling state. However, unpacking a float32 signaling NaN always results in a quiet NaN even if the packed value is a signaling NaN. The behavior is inconsistent between float32 and float64.
To Reproduce
static bool is_quiet_nan(double nan_val) {
uint64_t bit_pattern = reinterpret_cast<uint64_t&>(nan_val);
int is_quiet_bit_index = DBL_MANT_DIG - 2;
return (bit_pattern >> is_quiet_bit_index) & 1;
}
TEST(MSGPACKC, simple_buffer_float_signaling_nan)
{
if (!numeric_limits<float>::has_signaling_NaN) {
return;
}
float val = numeric_limits<float>::signaling_NaN();
msgpack_sbuffer sbuf;
msgpack_sbuffer_init(&sbuf);
msgpack_packer pk;
msgpack_packer_init(&pk, &sbuf, msgpack_sbuffer_write);
msgpack_pack_float(&pk, val);
msgpack_zone z;
msgpack_zone_init(&z, 2048);
msgpack_object obj;
msgpack_unpack_return ret =
msgpack_unpack(sbuf.data, sbuf.size, NULL, &z, &obj);
ASSERT_EQ(MSGPACK_UNPACK_SUCCESS, ret);
ASSERT_EQ(MSGPACK_OBJECT_FLOAT32, obj.type);
EXPECT_TRUE(isnan(obj.via.f64));
EXPECT_FALSE(is_quiet_nan(obj.via.f64));
#if defined(MSGPACK_USE_LEGACY_NAME_AS_FLOAT)
EXPECT_TRUE(isnan(obj.via.dec));
EXPECT_FALSE(is_quiet_nan(obj.via.dec));
#endif // MSGPACK_USE_LEGACY_NAME_AS_FLOAT
msgpack_zone_destroy(&z);
msgpack_sbuffer_destroy(&sbuf);
}
TEST(MSGPACKC, simple_buffer_double_signaling_nan)
{
if (!numeric_limits<double>::has_signaling_NaN) {
return;
}
double val = numeric_limits<double>::signaling_NaN();
msgpack_sbuffer sbuf;
msgpack_sbuffer_init(&sbuf);
msgpack_packer pk;
msgpack_packer_init(&pk, &sbuf, msgpack_sbuffer_write);
msgpack_pack_double(&pk, val);
msgpack_zone z;
msgpack_zone_init(&z, 2048);
msgpack_object obj;
msgpack_unpack_return ret =
msgpack_unpack(sbuf.data, sbuf.size, NULL, &z, &obj);
ASSERT_EQ(MSGPACK_UNPACK_SUCCESS, ret);
ASSERT_EQ(MSGPACK_OBJECT_FLOAT64, obj.type);
ASSERT_EQ(MSGPACK_OBJECT_FLOAT, obj.type);
#if defined(MSGPACK_USE_LEGACY_NAME_AS_FLOAT)
ASSERT_EQ(MSGPACK_OBJECT_DOUBLE, obj.type);
#endif // MSGPACK_USE_LEGACY_NAME_AS_FLOAT
EXPECT_TRUE(isnan(obj.via.f64));
EXPECT_FALSE(is_quiet_nan(obj.via.f64));
#if defined(MSGPACK_USE_LEGACY_NAME_AS_FLOAT)
EXPECT_TRUE(isnan(obj.via.dec));
EXPECT_FALSE(is_quiet_nan(obj.via.dec));
#endif // MSGPACK_USE_LEGACY_NAME_AS_FLOAT
msgpack_zone_destroy(&z);
msgpack_sbuffer_destroy(&sbuf);
}
Expected behavior
Either both tests should pass or both tests should fail. Currently, simple_buffer_double_signaling_nan
passes but simple_buffer_float_signaling_nan
fails.
Root cause
The unpacker returns float32 values as double
. This conversion implicitly causes the float32 signaling NaN to become a float64 quiet NaN.
Possible solution
The unpacker could return float32 values as float
instead of converting them to double
. This might be a breaking change because callers may need to update their usage to check a different msgpack_object_union
field. However, in addition to resolving this issue, it could also be more convenient for callers because they wouldn’t need to cast the double
field back to float
.
Thank you for reporting the issue.
I got confused. The code seems that C not C++. But your link to Root cause is C++ code.
Now C and C++ are completely different branch and different implementation.
Which one has the isssue C or C++, or both?
It seems that code is for C.
I study about signaling/quiet NaN. Now I understand the issue.
It is common in C and C++.
I consider fixing the issue with minimal code change.
This code demonstrates the issue (signaling NaN becomes quiet NaN when convert from float to double), and another possible fix.
https://godbolt.org/z/Yehe6GMGG
Your link indicates the issue is caused by the following function. I agree.
inline void unpack_float(float d, msgpack::object& o)
{ o.type = msgpack::type::FLOAT; o.via.f64 = d; }
When I apply the minimal fix, then the code would be as follows:
inline void unpack_float(float d, msgpack::object& o)
{
o.type = msgpack::type::FLOAT;
o.via.f64 = d;
if (isnan(d)) { // could be `if (d != d) { ` for porablility
if (!is_quiet_nan(d)) {
clear_quiet(o.via.f64);
}
}
}
What do you think ?
Bitimage print version: https://godbolt.org/z/rqaPx4b34
I ran my poc code on windows/msvc2019.
Here is the result:
has check
float has_quiet_NaN :1
float has_signaling_NaN :1
double has_quiet_NaN :1
double has_signaling_NaN :1
float quiet/signaling check
is_quiet_nan(fqn) :1
is_quiet_nan(fsn) :1
double quiet/signaling check
is_quiet_nan(dqn) :1
is_quiet_nan(dsn) :1
float to double converted quiet/signaling check
isnan(dqn) :1
isnan(dsn) :1
is_quiet_nan(dqn) :1
is_quiet_nan(dsn) :1 invalid
bitimage fsn :01111111110000000000000000000001
bitimage fixed_dsn before:0111111111111000000000000000000000100000000000000000000000000000
bitimage fixed_dsn after :0111111111111000000000000000000000100000000000000000000000000000
isnan(fixed_dsn) :1
is_quiet_nan(fixed_dsn) :1 valid
Linux result:
has check
float has_quiet_NaN :1
float has_signaling_NaN :1
double has_quiet_NaN :1
double has_signaling_NaN :1
float quiet/signaling check
is_quiet_nan(fqn) :1
is_quiet_nan(fsn) :0
double quiet/signaling check
is_quiet_nan(dqn) :1
is_quiet_nan(dsn) :0
float to double converted quiet/signaling check
isnan(dqn) :1
isnan(dsn) :1
is_quiet_nan(dqn) :1
is_quiet_nan(dsn) :1 invalid
bitimage fsn :01111111101000000000000000000000
bitimage fixed_dsn before:0111111111111100000000000000000000000000000000000000000000000000
bitimage fixed_dsn after :0111111111110100000000000000000000000000000000000000000000000000
isnan(fixed_dsn) :1
is_quiet_nan(fixed_dsn) :0 valid
float signaling NaN
SExp Fraction(Significand)
01111111110000000000000000000000 Windows quiet NaN
01111111110000000000000000000000 Linux quiet NaN
01111111110000000000000000000001 Windows signaling NaN
01111111101000000000000000000000 Linux signaling NaN
if Exponent is all 1
if Fraction is 0
infinity (Sign is still enabled)
else
NaN *1
else
normal value
*1
It seems that only Fraction != 0 is defined.
So there are many variations.
I think that your solution, introducing float type to object union, is better than my approach.
However, there is still problem.
- Create signaling NaN on Linux
- pack it
- send it to Windows
- unpack it on Windows, then got quiet NaN *2
similarlly
- Create signaling NaN on Windows
- pack it
- send it to Linux
- unpack it on Linux, then got quiet NaN *2
*2
I'm still not sure how to distinguish quiet or signal.
I think that the following code is not portable. Works well on Linux but doesn't work on Windows:
inline bool is_quiet_nan(float nan_val) {
std::uint32_t bit_pattern = reinterpret_cast<std::uint32_t&>(nan_val);
int is_quiet_bit_index = FLT_MANT_DIG - 2;
return (bit_pattern >> is_quiet_bit_index) & 1;
}
Perhaps msgpack-c could have a database of double/float signaling/quiet NaN bit representation on common operating systems, like Linux, macosx, Windows.
If they doesn't have conflicts, then unpack function could treat them well.
Conflict means OS-A quiet NaN bit representation is the same as OS-B signaling NaN bit representation.
Is there any information about quiet/signaling NaN bit repserentation ?
I testes on msvc2022 too.
Then got the same result as msvc2019.
I found a document:
https://learn.microsoft.com/en-us/cpp/build/ieee-floating-point-representation?view=msvc-170#nan---not-a-number
Quiet NaNs have a leading one in the significand, and get propagated through an expression. They represent an indeterminate value, such as the result of dividing by infinity, or multiplying an infinity by zero. Signaling NaNs have a leading zero in the significand.
SExp Fraction(Significand)
01111111110000000000000000000000 Windows quiet NaN
01111111110000000000000000000001 Windows signaling NaN
It seems that the document and result are not mached.
I wrote a PR to fix this issue on C++.
Then some of CI doesn't pass. They are 32bit built using -m32
option.
So I've checked the behavior using godbolt.
32bit
https://godbolt.org/z/8ETYd3Gz4
:SExp Fraction(Significand)
bitimage fqn :01111111110000000000000000000000
bitimage fsn :01111111111000000000000000000000
:SExp Fraction(Significand)
bitimage dqn :0111111111111000000000000000000000000000000000000000000000000000
bitimage dsn :0111111111111100000000000000000000000000000000000000000000000000
64bit
https://godbolt.org/z/aa8Wxoaao
:SExp Fraction(Significand)
bitimage fqn :01111111110000000000000000000000
bitimage fsn :01111111101000000000000000000000
:SExp Fraction(Significand)
bitimage dqn :0111111111111000000000000000000000000000000000000000000000000000
bitimage dsn :0111111111110100000000000000000000000000000000000000000000000000
64 bit is expected behavior. The left most bit of Fraction(Significand) indicates quiet(1) or signaling(0).
But 32 bit is both 1. But the 2nd left most bit seems to indicate quiet(0) or signaling(1).
It is very inconsintent for me.
Am I missing something?
My printing way or reinterrupt_cast would be undefined behavior ??
https://en.wikipedia.org/wiki/IEEE_754#2019
Binary
...
For NaNs, quiet NaNs and signaling NaNs are distinguished by using the most significant bit of the trailing significand field exclusively,[d] and the payload is carried in the remaining bits.
Perhaps the encoding rule for signaling/quiet NaN is very recently decided. If it is, there are various kinds of encoding...
So msgpack-c pack/unpack using its platform one. So introducing f32 is good.
How do I write test for that ...
try rising signal ??
I checked signal using fetestexcept.
signaling NaN with floating operation results:
Compiler | arch | float | double | godbolt |
---|---|---|---|---|
clang++ | 64 | raised | raised | https://godbolt.org/z/nP697jvTG |
clang++ | 32 | not raised | not raised | https://godbolt.org/z/hhs56dnnz |
g++ | 64 | raised | raised | https://godbolt.org/z/x18svh5Yr |
g++ | 32 | not raised | not raised | https://godbolt.org/z/384YddYqn |
arch | type | Fraction(Significand) MSB2bit |
---|---|---|
64bit | signaling | 01 |
64bit | quiet | 10 |
32bit | signaling | 11 |
32bit | quiet | 10 |
Thanks for looking into this @redboltz!
I think the Windows issue you encountered may just be an issue with the Windows C++ standard library implementation: https://developercommunity.visualstudio.com/t/stdnumeric-limitssignaling-nan-returns-quiet-nan/155064. Perhaps a floating point operation that generates an sNAN still has the correct bit pattern even if numeric_limits::signaling_NaN()
does not?
Is the 32-bit CPU issue you encountered the same as the above Windows issue?
Your solution of clearing the is_quiet
bit seems like it should work well. That being said, introducing an f32
field seems like the cleanest solution to me.
Actually, the Windows issue is a 32-bit CPU issue, not the other way around! More details here.
From what I understood, if a function returns an sNAN, it will always get converted to a qNAN (and raise a floating point exception) because the calling conventions on x86 specify that floating point values are returned in the x87 registers, which are inside the FPU. Hence, numeric_limits::signaling_NaN()
always returns a qNAN on x86.