rust-lang/rust

NEG_INFINITY * 0.0 is not a negative NaN but NEG_INFINITY.mul(0.0) is

glandium opened this issue ยท 21 comments

The following code:

use std::ops::Mul;

fn main() {
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY.mul(0.0)), -1.0f64);
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY * 0.0), -1.0f64);
}

has the following output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1.0`,
 right: `-1.0`', src/main.rs:5:5

In other words, the first assert_eq! passes, but the second doesn't, while they should be the same thing.

Another variant that works: https://internals.rust-lang.org/t/should-neg-infinity-0-0-return-a-negative-nan/13883/2

The first assert_eq! also fails in release mode.

MIR excerpt:

[...]
        _38 = const NaNf64;              // scope 0 at src/main.rs:5:32: 5:55
                                         // mir::Constant
                                         // + span: src/main.rs:5:32: 5:55
                                         // + literal: Const { ty: f64, val: Value(Scalar(0x7ff8000000000000)) }
        _37 = f64::<impl f64>::copysign(const 1f64, move _38) -> bb8; // scope 0 at src/main.rs:5:16: 5:56
[...]

Miri also fails on the first assertion. This combined with the fact that with LLVM it fails in release mode too makes me suspect that this is a bug in apfloat, which LLVM uses for compile time float operations. Rustc uses the rust port rustc_apfloat for const eval and by extension miri.

Note that the IEEE-754 floating-point standard states the following: "For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation." (Multiplication is one of these "other operations".) So, the fact that both assertions behave differently is certainly not forbidden by the standard.

@silene It's allowed by the standard, but it seems inconsistent to have different behavior depending on whether an operation is evaluated at compile time or runtime.

Interestingly, the corresponding C++:

#include <iostream>
#include <cassert>
#include <cmath>

int main() {
	assert(std::copysign(1.0, -INFINITY * 0.0) == -1.0);
	return 0;
}

works with GCC at -O0 and -O3, but not clang, at neither optimization level.

It will also depend on the hardware. I've kind of known about this issue for a while (since digging into apfloat a while ago). The default behavior for APFloat's NaNs isn't the same as x86's behavior, which is problematic. In practice I believe the only long term solution is to make apfloat architecture-aware, but this is painful and diverges with what LLVM does.

That said, the printing might violate the IEEE standard, I'm not sure.

CC @workingjubilee, who has dug into some details here more recently, specifically the details around printing.

So, just to make sure I understand -- f64::NEG_INFINITY.mul(0.0) creates a NaN, and copysign is used to test the NaNs sign?

This seems like it could be considered a part or a duplicate of #73328.

Yes, this is probably a duplicate of #73328, aside from the note that it may be non-conformant not to print the sign for -nan.

However, I don't know off the top if that's actually the case, and don't have my copy of IEEE754 handy.

In practice I believe the only long term solution is to make apfloat architecture-aware, but this is painful

We did that for CompCert and it was definitely painful, yet CompCert does not even support that many architecture, compared to Rust and LLVM. And the pain was not due to formal verification, because this part cannot be verified anyway. It was a matter of deciphering the (incorrect) documentation of hardware constructors, writing a bazillion of tests to understand what the hardware actually does under the hood (e.g., is a - b physically wired as a + (-b) or not?) The only reason we went though all these hoops is because CompCert's semantics does not allow for unspecified bits (only unspecified bytes). If it did, we would gladly have marked the sign of NaN values as unspecified, as expressed by the floating-point standard.

bluss commented

Miri also fails on the first assertion. This combined with the fact that with LLVM it fails in release mode too makes me suspect that this is a bug in apfloat, which LLVM uses for compile time float operations. Rustc uses the rust port rustc_apfloat for const eval and by extension miri.

With float arithmetic not being stable in const fn, Rust could conservatively disable const eval for any float expression involving non-finite (intermediate) values - or is there a reason this can't work (? stable const expressions might be a problem).

With float arithmetic not being stable in const fn

Well, float arithmetic is promoted to 'static lifetime on stable though and the result, computed by CTFE, is observable... and when I tried to forbid that crater found various users.

And the ConstProp pass might also be doing float arithmetic. That said, since the LLVM optimizer has the same behavior, it will change little to just disable this part of the rustc optimizer.

Rust could conservatively disable const eval for any float expression involving non-finite (intermediate) values

One thing is that, short of compiler magic, you'd need some way to provide f32::NAN, f64::INFINITY et al, which are currently defined like pub const NAN: f32 = 0.0_f32 / 0.0_f32;.

One thing is that, short of compiler magic, you'd need some way to provide f32::NAN, f64::INFINITY et al, which are currently defined like pub const NAN: f32 = 0.0_f32 / 0.0_f32;

It might be fine to allow expressions with NaN outputs, but not inputs.

This would still require fixing APFloat bugs, but not as many.

IMO the best way forward for apfloat issues is to report them upstream in LLVM, help fix them there, and backport fixes to the Rust version.

I would guess that they don't care that the sign differs for NaNs on the target platform.

Also, fixing it requires complicating the API of APFloat so that it's aware of the target, so it's not a trivial change, unfortunately.

Oh right, I forgot... the spec says the sign is not specified.

In that case I am not even sure if there's a bug here, or does IEEE mandate that these operations all be deterministic? And even then that seems like something we are already tracking at #73328, so I am inclined to close this as a duplicate.

eddyb commented

The only reason we went though all these hoops is because CompCert's semantics does not allow for unspecified bits (only unspecified bytes). If it did, we would gladly have marked the sign of NaN values as unspecified, as expressed by the floating-point standard.

I wonder how infuriatingly difficult having undef bits (instead of bytes) in miri would be. It certainly feels like the clean solution, though you can't actually use undef, since it's not UB to observe those bits, just undesirable at e.g. compile-time.

I wonder how infuriatingly difficult having undef bits (instead of bytes) in miri would be. It certainly feels like the clean solution, though you can't actually use undef, since it's not UB to observe those bits, just undesirable at e.g. compile-time.

The undef_mask in Allocation would have to become per-bit... but the more interesting question is, what would that entail? There is no operation in Rust to load a single bit from memory, and any byte-wise load would "explode" the undef to the full size of the load anyway. And even if we fixed that (which might not be faithful to what LLVM does), we'd still get UB when such a "value with an undef bit" is used for any kind of arithmetic, including further floating-point arithmetic.

So I see no practical way in which this would help here.

@eddyb I see you thumbs-upped by previous comment, so I am going to close this as a duplicate of #73328.

Just to clarify, IEEE754-2019 does not specify or interpret the sign bit of the quiet NaN result from this multiplication, correct, and if the IEEE754 function copySign, that could be represented in Rust as

fn copySign<T: Float>(x: T, y, Y:) -> T

is used, it gives "unspecified sign if y is a NaN".

I'm pretty sure that's not quite right, as it depends where the NaN comes from. Operations like abs() are required to unset the sign bit even if the input is NaN.

You are correct, if a NaN has its sign changed by abs or, indeed, by being the first operand to copySign, it then does have a specified sign bit. However, if the NaN is the second operand to copySign, the resulting value does not seem to be guaranteed. Presumably if you set a NaN's sign bit, using abs and friends, and then use it as an operand, you can rely on the resulting sign bit, but perhaps not before?