Poor codegen for derived `==` on simple 2-field struct
scottmcm opened this issue · 4 comments
Given a basic struct like this,
#[derive(Copy, Clone, PartialEq, Eq)]
pub struct Entity {
g: u32,
i: u32
}
The generated ==
is suboptimal:
#[no_mangle]
pub fn derived_eq(x: &Entity, y: &Entity) -> bool {
x == y
}
derived_eq:
movq xmm0, qword ptr [rdi]
movq xmm1, qword ptr [rsi]
pcmpeqd xmm1, xmm0
pshufd xmm0, xmm1, 80
movmskpd eax, xmm0
cmp eax, 3
sete al
ret
https://rust.godbolt.org/z/1b1xsnzx6
For comparison, not using short-circuiting
#[no_mangle]
pub fn good_eq(x: &Entity, y: &Entity) -> bool {
(x.g == y.g) & (x.i == y.i)
}
gives a much-simpler codegen
good_eq:
mov rax, qword ptr [rsi]
cmp qword ptr [rdi], rax
sete al
ret
This appears to be related to LLVM not knowing whether the second field is poison, as Alive2 confirms that LLVM isn't allowed to convert the former into the latter (at least for the optimized forms): https://alive2.llvm.org/ce/z/bAsJGN
Is there maybe some metadata we could put on the parameter attributes to tell LLVM that reading them isn't poison? It appears that just reading them first, like (same godbolt link above)
#[no_mangle]
pub fn failed_workaround(x: &Entity, y: &Entity) -> bool {
let Entity { g: g1, i: i1 } = *x;
let Entity { g: g2, i: i2 } = *y;
g1 == g2 && i1 == i2
}
still isn't enough for it to remove the short-circuiting, as even though that emits the !noundef
loads first, it seems like LLVM's SROAPass
moves them behind the branch from &&
.
FWIW, clang(trunk) has the same codegen difference: https://cpp.godbolt.org/z/bbaz196GP
It might not have a choice, though, since C++ references are mostly just pointers.
Miri reports no UB on the following code (Playground):
use core::mem::MaybeUninit;
#[derive(Copy, Clone, PartialEq, Eq)]
#[repr(C)]
pub struct Entity {
g: u32,
i: u32,
}
#[no_mangle]
pub fn derived_eq(x: &Entity, y: &Entity) -> bool {
x == y
}
#[derive(Copy, Clone)]
#[repr(C)]
pub struct EntityUninit {
g: u32,
i: MaybeUninit<u32>,
}
fn main() {
let left_ref = &Entity { g: 1, i: 1 };
let right_uninit = EntityUninit {
g: 0,
i: MaybeUninit::uninit(),
};
let right_ref = unsafe { &*(&right_uninit as *const EntityUninit as *const Entity) };
dbg!(derived_eq(left_ref, right_ref));
}
So LLVM is completely correct in its conservatism here.
@rustbot label A-codegen T-opsem
Agreed that LLVM is correct to be conservative for the currently-derived code; that's why I opened it as a "what can rust do to let it distinguish these cases". But it's also unnecessarily-conservative in the failed_workaround
example -- there we told it !noundef
on the loads of all 4 fields before the branches, and MIRI confirms that there's UB for it if I update your example to use it instead of just ==
: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7af918d80f3d5347e661a381fe675489
Here's a zulip lang thread I was just typing up to talk about what the Rust references rules should be to allow us to do something better, which I opened this issue to use to seed a discussion https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Poison-ness.20behind.20references/near/401455323
(The zulip thread ended up with a very similar example https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=159748907525ea9c1f7d0c568ae75eb1 🙂)
Note that the two assembly snippets seem 100% equivalent. The longer version is still doing the loads unconditionally (which makes sense since the arguments are dereferenceable
), and of course x86 has no concept of undef.
Thus, even if that particular IR transformation would be invalid, it's still suboptimal for LLVM not to find some way to optimize this.
(Perhaps by having the optimization insert freeze
instructions?)
Interestingly, the codegen for this seems to be quite dependent on what exact target-cpu it's compiled for: https://rust.godbolt.org/z/ebPnGToW3 . It seems that enabling AVX doesn't change LLVM's approach of using vector loads and just uses more compact AVX instructions, but using x86-64-v4
then causes it to emit the right instructions. As already mentioned by @scottmcm, this isn't exclusive to Rust either, C++ has the exact same codegen for all three cases. Because of this I'm guessing it's just an LLVM bug, where it only chooses the best option on x86-64-v4
, even though that same option doesn't actually require any special target features, and it'd probably be a good idea to open an LLVM bug report about it instead of endlessly fiddling around with the IR emitted by rustc to try and get LLVM to play along.