f16 sNaN is casted to f8e5m2 qNaN in Half_To_Float8E5m2 test
Closed this issue · 4 comments
qNaN vs sNaN
f16 qNaN 0.11111.1000000000
f16 sNaN 0.11111.0100000000 (one of the examples)
Problematic Test is Half_To_Float8E5m2 -
Eigen::half nan =
Eigen::numext::bit_cast<Eigen::half>(static_cast<uint16_t>(0x7C01));
EXPECT_EQ(static_cast<float8_e5m2>(nan).rep(), 0x7E);
input is 0x7C01 which is sNaN 0.11111.0000000001
expected result is 0x7E which is qNaN 0.11111.10
- This is incorrect, as the type of NaN should be preserved in static_cast.
Instead, the expected result should be 0.11111.01
- 0x7D
@cantonios @jakevdp @hawkinsp what you think?
@cantonios @jakevdp @hawkinsp what you think?
Yeah, it looks like we always quiet the NaN on conversions. We should be satisfying IEEE 6.2.3, preserving the signaling bit and payload as much as possible:
Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to
the same narrower format, should not change the quiet NaN payload in any way except to make it
canonical.
Conversion of a quiet NaN to a floating-point format of the same or a different radix that does not allow
the payload to be preserved shall return a quiet NaN that should provide some language-defined diagnostic
information.
I believe the current behavior is correct as per IEEE 754 6.2 p2:
Under default exception handling, any operation signaling an invalid operation exception and for which a
floating-point result is to be delivered, except as stated otherwise, shall deliver a quiet NaN.
The text in 6.2.3 p3/p4 is talking about inputs which are qNaN rather than sNaN.
After further investigation, I found that many systems and libraries typically convert a signaling NaN (sNaN) into a quiet NaN (qNaN) during operations such as format conversion. This behavior aligns with standard practices to simplify handling and avoid triggering exceptions in subsequent computations.
I believe we can close this issue.
Special thanks to @majnemer and @cantonios for your valuable comments!