rsd-devel/rsd

[Bug Report] Wrong result of `+0` minus `+0`

Opened this issue · 9 comments

Hi,
when executing +0 minus +0, the result in rsd is -0. However, i think it should be +0.

Is there something wrong in the implementation? Could you help me confirm this problem?

Thank you for finding this. Such a result is a bug. For any finite floating-point number (meaning "non-NaN") x, x - x should be +0.0. (Thus, you are right.)

IEEE754-2019 6.3 The sign bit
When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 under all rounding-direction attributes except roundTowardNegative;

Unfortunately, I couldn't reproduce the problem on my end. I used a test code in Processor/Src/Verification/TestCode/Asm/FP/code.s. I rewrite L13-14 to set both a1 and a2 to zero. The result of the subtraction, which would be in fa4 (f14), was confirmed to be 0x00000000 (+0.0). The only 0x80000000 (-0.0) was in fs6 (f21), which resulted from the FNMSUB instruction and was correct.

To better understand, could you please provide more details? Specifically, the following information would be helpful:

  • Sorce code and compiler version, options, etc. (or an assembly code)
  • Kanata log
  • Register.csv

We want to resolve this issue and greatly appreciate your assistance. Thank you!

Thanks for your reply.

Here are the all files you will need for debugging: https://wormhole.app/zkkA4#yJpgCuoEpgaozwhhZ-2R0w

I confirmed the test_spike.log file contains the following:

core   0: 0x000020b0 (0x040fa407) flw     fs0, 64(t6)
core   0: 3 0x000020b0 (0x040fa407) f8  0x3abff4ac mem 0x80003040
(omit)
core   0: 0x000020e8 (0x0b0fab07) flw     fs6, 176(t6)
core   0: 3 0x000020e8 (0x0b0fab07) f22 0xd54ce604 mem 0x800030b0
(omit)
core   0: 0x00003420 (0x096406d3) fsub.s  fa3, fs0, fs6
core   0: 3 0x00003420 (0x096406d3) c1_fflags 0x00000001 f13 0x554ce604

Since std::bit_cast<unsigned>( std::bit_cast<float>(0x3abff4ac) - std::bit_cast<float>(0xd54ce604) ) == 0x554ce604, this is a correct behavior.
Could you please provide the hex file of the program where the wrong behavior occurs?
Thank you.

Sorry, I lost the hex file of this bug. I manually restored the instruction sequence this afternoon according to spike's commit log. Here are the final hex:
zero_minus_zero.hex.zip

For reproduce, you need to modify PC_GOAL to 32'h000030dc and cancle define RSD_NARROW_PC in Processor/Src/Memory/MemoryMapTypes.sv.

Moreover, here is the spike commit log: spike_zero_minus_zero.log

In detail, there is a mismatch on ft8 between spike(0x00000000, as shown in the following figure) and rsd(0x80000000).
image

Thank you for your cooperation. I'll try reproducing.

It reproduced. Thank you restoring the hex file!

The root of this bug is wrong behavior of fmul.s.
First, the fmul.s ft1, ft8, fs6 is executed and should produce -0.0 because ft8 is +0.0 and fs6 is a negative number. However, in bd7c5c1, it wrongly returns +0.0 (#92).
Then, the fmsub.s ft8, ft1, ft6, ft8 with ft6 is a negative number and ft8 is +0.0 is executed. When ft1 is -0.0, it should returns +0.0 because it is a same-number subtraction. When ft1 is +0.0, it should returns -0.0 because it is a (-0.0) - (+0.0)-like subtraction. So, the behavior of fmsub.s is correct.

In summary, the fmsub.s implementation is correct and the wrong implementation of fmul.s (#92) leads to fmsub.s instruction returning -0.0.

Please do not close this issue and keep it open until it is sure that #92 is resolved and this issue no longer occurs.

The meaning of fmsub.s ft8, ft1, ft6, ft8 is ft8=ft1×ft6−ft8. In rsd, ft1=+0.0, ft6 is negative, ft8=+0.0. so it means (+0.0)*negative-(+0.0).
However, when computing (+0.0)*negative, the result in rsd is +0.0 or -0.0?
Does #92 only happens on fmul not on ordinary multiplication?

Yes, the problem #92 only happens on fmul.s.

The current fmul.s implementation on RSD wrongly returns fma(a,b,+0.0). It is not that the multiplier produces a wrong-sign result, but the last + (+0.0) causes the problem.

I see. Thanks for your patient reply!