any() on boolean vectors on 32-bit ARM likely broken
Closed this issue · 5 comments
Steps to reproduce
- Use and ARMv7 + NEON Linux host.
git clone https://github.com/hsivonen/encoding_rs
cd encoding_rs
git checkout 3049251cd80bb8eebc7d8c96057480d4e84fffef
RUSTFLAGS=' -C target-feature=+neon' cargo test --features simd-accel
Expected results
Expected tests to pass, since encoding_rs
contains no 32-bit ARM-specific code and the same code that only uses simd
-crate facilities and cross-architecture LLVM shuffles works on Aarch64.
Actual results
Various tests fail. Since it's unlikely that LLVM is broken and unlikely that the rustc-to-LLVM part is broken just for 32-bit ARM, I suspect that the implementation for any()
on boolean vectors is broken.
The implementation seems to assume that it's OK to transmute a 128-bit vector into a pair of 64-bit vectors (the 128-bit registers are aliased with two 64-bit registers). This is not what clang's arm_neon.h
does, so maybe the assumption that the transmute is OK is no longer valid with rustc and LLVM updates.
To use an aliased half-register, arm_neon.h
does this:
__ai uint8x8_t vget_high_u8(uint8x16_t __p0) {
uint8x8_t __ret;
__ret = __builtin_shufflevector(__p0, __p0, 8, 9, 10, 11, 12, 13, 14, 15);
return __ret;
}
AFAICT, to fix this, a compiler RFC to extend SIMD shuffles so that the parameter and return value lane number doesn't need to be the same is needed. I'm thinking adding simd_shuffle16to8
, etc.
@hsivonen I suspect you don't need an RFC for that. Instead, you can probably just submit a PR. The raw shuffle intrinsics are unstable today and probably will be for the foreseeable future. (So long as the spectre of integer generics looms, I suspect that will be true.) I think the quickest way to stabilization is to provide a layer above the shuffle in std
, e.g., by defining vget_high_u8
(assuming that's a vendor intrinsic?) in std
(errrmmm, I mean core
), which would internally use the appropriate shuffle.
OK. I'll try to go with the direct rustc PR route.
AFAICT, to fix this, a compiler RFC to extend SIMD shuffles so that the parameter and return value lane number doesn't need to be the same is needed. I'm thinking adding
simd_shuffle16to8
, etc.
I was wrong. rustc already supports N to M shuffles. The number is the shuffle name is the output lane count and does not limit the input lanes.