rust-lang/rust

32-bit ARM NEON intrinsics are unsound due to subnormal flushing

RalfJung opened this issue · 2 comments

This is the ARM NEON version of #114479. Example by @beetrees, to be compiled with --target armv7-unknown-linux-gnueabihf -O -Ctarget-feature=+neon:

#![feature(stdarch_arm_neon_intrinsics)]

use std::arch::arm::{float32x2_t, vadd_f32};
use std::mem::transmute;

#[inline(never)]
fn print_vals(x: float32x2_t, i: usize, vals_i: u32) {
    println!("x={x:?} i={i} vals[i]={vals_i}");
}

const ZERO: float32x2_t = unsafe { transmute([0, 0]) };
const INC: float32x2_t = unsafe { transmute([f32::MIN_POSITIVE / 128.0, f32::MIN_POSITIVE / 128.0]) };
const TARGET: [u32; 2] = unsafe { transmute([f32::MIN_POSITIVE, f32::MIN_POSITIVE]) };

#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
    let mut x: float32x2_t = ZERO;
    let mut i: usize = 0;
    while unsafe { transmute::<float32x2_t, [u32; 2]>(x) } != TARGET {
        print_vals(x, i, vals[i]);
        x = unsafe { vadd_f32(x, INC) };
        x = unsafe { vadd_f32(x, INC) };
        i += 2;
    }
}

pub fn main() {
    let mut vals: [u32; 300] = [0; 300];
    for i in 0..300 { vals[i as usize] = i; }
    evil(&vals);
}

#[cfg(not(target_feature = "neon"))]
compile_error!("-Ctarget-feature=+neon required");

LLVM's optimizations assume they can calculate what that loop does, and that it follows IEEE semantics. But LLVM's codegen produces code that does not have IEEE semantics, and instead flushes subnormals to zero. 💥

This almost surely also affects the unstable std::simd on ARM.

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@beetrees do you know if there is an LLVM issue for this? There is llvm/llvm-project#89885 but that has x86-32 and SSE in its title.

EDIT: Ah, it's llvm/llvm-project#16648.