rust-lang/rust

Emit noundef LLVM attribute

nikic opened this issue · 6 comments

nikic commented

LLVM 11 introduces a new noundef attribute, with the following semantics:

This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation.

In LLVM 11 itself it doesn't do anything yet, but this will become important in the future to reduce the impact of freeze instructions.

We need to figure out for which parameters / return values we can emit this attribute. We generally can't do so if any bits are unspecified, e.g. due to padding. More problematic for Rust is rust-lang/unsafe-code-guidelines#71, i.e. the question of whether integers are allowed to contain uninitialized bits without going through something like MaybeUninit.

If we go with aggressive emission of noundef, we probably need to punish safe-guard mem::uninitialized() users with liberal application of freeze.

cc @RalfJung

We generally can't do so if any bits are unspecified, e.g. due to padding.

The bits you quote explicitly say

Note that this does not refer to padding introduced by the type’s storage representation.

So it seems to me types with padding are fine?

We should definitely be able to already add this for types like bool or char, for references, and for structs consisting only of such types.

Cc @rust-lang/wg-unsafe-code-guidelines

If we go with aggressive emission of noundef, we probably need to punish safe-guard mem::uninitialized() users with liberal application of freeze.

To be clear, adding noundef there would not be wrong (depending on the type used), we just might want to take it easy because there is a lot of broken code out there using mem::uninitialized the wrong way.

This attribute only refers to values, not memory pointed to by pointer arguments, right? I don't think there are many cases where we pass any aggregates by value in LLVM IR, except:

  • ScalarPair types are returned as anonymous 2-member structs, e.g. fn foo() -> (u8, u32) will return {i8, i32} at LLVM level. This should be covered by the quoted exemption for padding introduced by the type.
  • On some targets, an ABI lowering like "pass this big aggregate by value, on the stack" uses argument type [N x iM] (for M, N such that the size matches). I am pretty sure we can't use noundef in those cases (if the original type has padding) since it's not visible to LLVM that some bytes of this aggregate are padding at Rust level.
nikic commented

We generally can't do so if any bits are unspecified, e.g. due to padding.

The bits you quote explicitly say

Note that this does not refer to padding introduced by the type’s storage representation.

So it seems to me types with padding are fine?

Sorry for being unclear on this point. Whether padding is fine depends on how the value is passed. E.g. if we have a boolean and pass it as i1, then using noundef is fine. If we have a boolean and pass it as i8 with the top seven bits unspecified, then using noundef is not possible. The same applies to aggregates -- whether we pass it as an actual aggregate, or reinterpreted as a type that makes padding accessible.

This attribute only refers to values, not memory pointed to by pointer arguments, right?

That's correct. On a pointer type noundef determines whether the pointer itself may be undef/poison, not the pointed-to memory.

Booleans are always passed as i8 with zero extension rather than bit casting, so all bits are specified. This applies both for arguments and memory accesses. Only local variables can be i1.

When generating LLVM types from Rust types, do we ever emit padding explicitly or something like that? I could imagine us "taking charge" of some aspects of enum lowering in this way.

When generating LLVM types from Rust types, do we ever emit padding explicitly or something like that?

Yes. https://rust.godbolt.org/z/xd4er1hTY

#[repr(C)]
pub struct Foo {
    a: u8,
    // 1 byte padding
    b: u16,
    // 4 bytes padding
    c: u64,
}

pub fn foo(_: Foo) {}
%Foo = type { i8, [1 x i8], i16, [2 x i16], i64 }
[...]