Differences between LLVM const-eval of floats and runtime behaviour can cause miscompilations
beetrees opened this issue ยท 14 comments
I tried this code (based on an example from another issue, which is lightly adapted from @comex's example on a different issue):
#[inline(never)]
fn print_vals(x: f64, i: usize, vals_i: u32) {
println!("x={x} i={i} vals[i]={vals_i}");
}
#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
// Loop variables:
let mut x: f64 = 1.0; // x = x.sin() every time
let mut i: usize = 0; // increments by 2 every time
while x != 0.17755388399451055 {
// LLVM will do a brute-force evaluation of this loop for up to 100
// iterations to try to calculate an iteration count. (See
// `llvm/lib/Analysis/ScalarEvolution.cpp`.) Under host floating
// point semantics (on x86_64-unknown-linux-gnu), `x` will equal exactly
// 0.17755388399451055 after 90 iterations; LLVM discovers this by
// brute-force evaluation and concludes that the iteration count is
// always 90.
// Now, if this loop executes 90 times, then `i` must be in the range
// `0..180`, so the bounds check in `vals[i]` should always pass, so
// LLVM eliminates it.
print_vals(x, i, vals[i]);
// Update `x`. The exact computation doesn't matter that much; it just
// needs to:
// (a) be possible to constant-evaluate by brute force (i.e. by going
// through each iteration one at a time);
// (b) be too complex for IndVarSimplifyPass to simplify *without*
// brute force;
// (b) differ depending on the current platforms floating-point math
// implementation.
// `sin` is one such function.
x = x.sin();
// Update `i`, the integer we use to index into `vals`. Why increment
// by 2 instead of 1? Because if we increment by 1, then LLVM notices
// that `i` happens to be equal to the loop count, and therefore it can
// replace the loop condition with `while i != 90`. With `i` as-is,
// LLVM could hypothetically replace the loop condition with
// `while i != 180`, but it doesn't.
i += 2;
}
}
pub fn main() {
// Make an array on the stack:
let mut vals: [u32; 300] = [0; 300];
for i in 0..300 { vals[i as usize] = i; }
evil(&vals);
}I expected to see this happen: The 100% safe code never segfaults when compiled with optimisations.
Instead, this happened: When cross-compiled with optimisations to a platform with a sin implementation that does not produce identical results to the platform on which the code is being was compiled, the resulting binary will segfault. I discovered this by cross-compiling from x86_64-unknown-linux-gnu to x86_64-pc-windows-msvc and running the resulting binary with wine, but any pair of platforms with differing sin implementations will do.
Meta
rustc --version --verbose:
rustc 1.77.2 (25ef9e3d8 2024-04-09)
binary: rustc
commit-hash: 25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04
commit-date: 2024-04-09
host: x86_64-unknown-linux-gnu
release: 1.77.2
LLVM version: 17.0.6
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-high
Shouldn't an LLVM issue be filed for this? Ideally the results of transcendentals should be represented by a "fuzzy" range, making compile time float equality impossible in this case.
llvm/llvm-project#89885 was filed on the llvm side
Ah, fun. So LLVM assumes that sin is deterministic. Yeah that's clearly incoherent.
Now I wonder if the non-determinism in NaNs (as specified in rust-lang/rfcs#3514) is subject to similar issues. If LLVM assumes float operations are deterministic then indeed it must not do any const-folding that differs from the effective runtime behavior.
@RalfJung Non-deterministic NaNs can cause miscompilations. Here is an example that works on x86-64 (playground):
#[inline(never)]
fn print_vals(x: f64, i: usize, vals_i: u32) {
println!("x={x} i={i} vals[i]={vals_i}");
}
#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
const INC: f64 = f64::MAX / 90.0;
let mut x: f64 = -1.0;
let mut i: usize = 0;
while x.is_sign_negative() {
print_vals(x, i, vals[i]);
// Use a big decrement to reach negative infinity quickly.
x -= INC;
// Convert infinity into a NaN (Inf - Inf = NaN)
x = x + x - x;
i += 2;
}
}
pub fn main() {
let mut vals: [u32; 300] = [0; 300];
for i in 0..300 { vals[i as usize] = i; }
evil(&vals);
}On x86-64 (and 32-bit x86), NaNs produced at runtime will always have a negative sign whereas NaNs produced by LLVM at compile time always have a positive sign. This means LLVM thinks the loop will exit when x becomes a NaN, but at runtime the loop will never exit.
llvm/llvm-project#90942 will be in #127513, so some of this may be improved soon.
I've tested both the examples and can confirm that this issue has been fixed by #127513.
Awesome, let's close this then. :)
Per llvm/llvm-project#89885 (comment) that was only a partial fix, are we still exposing this in some way?