Missed Optimization: Inefficient Handling of Mutable Reference in Simple Conditional Assignment
paolobarbolini opened this issue · 5 comments
The following example shows how LLVM isn't able to optimize trivial branches when dealing with &mut
pointers, while it can with owned values. I'd expect both functions to override current_len
with 0
without needing to branch.
#[no_mangle]
pub fn unoptimized(current_len: &mut u32) {
if *current_len != 0 {
*current_len = 0;
}
}
#[no_mangle]
pub fn optimized(mut current_len: u32) -> u32 {
if current_len != 0 {
current_len = 0;
}
current_len
}
unoptimized:
cmp dword ptr [rdi], 0
je .LBB0_2
mov dword ptr [rdi], 0
.LBB0_2:
ret
optimized:
xor eax, eax
ret
define void @unoptimized(ptr noalias nocapture noundef align 4 dereferenceable(4) %current_len) unnamed_addr {
start:
%_2 = load i32, ptr %current_len, align 4
%0 = icmp eq i32 %_2, 0
br i1 %0, label %bb3, label %bb1
bb1:
store i32 0, ptr %current_len, align 4
br label %bb3
bb3:
ret void
}
define noundef i32 @optimized(i32 noundef %0) unnamed_addr {
start:
ret i32 0
}
Cc @rust-lang/opsem
Yeah, under some of the considered models for Rust references (in particular, Tree Borrows), if a function never writes to a mutable reference, then the compiler cannot make it write. This makes some unsafe code a lot easier to reason about, greatly reducing the risk of UB. But it also means we can't hoist the write out of the if
in this example.
rust-lang/unsafe-code-guidelines#133 is probably the best UCG issue for this question.
I'm not clear on what the optimizer is supposed to do here. Unconditionally write?
An unconditional write is smaller and often faster, but it's not always faster. If the cache line is not already dirty, writing will dirty it, which requires the CPU to get an exclusive lock on the cache line (could be slow if there's contention with other cores) and also requires the line to eventually be written back to main memory (using up memory throughput).
This doesn't apply to the second example function, where the argument is passed and returned in a register.