rust-lang/unsafe-code-guidelines

Aliasing rules for `Unique`

RalfJung opened this issue · 5 comments

Unique is an ancient type that was originally intended to get noalias semantics, but hasn't been treated specially by the compiler in a while. However it could be a useful building block to let programmers inform the compiler about optimization potential.

Unique would be noalias without dereferenceable, which is not a combination that Stacked Borrows supports. However I hope that the next aliasing model will support that combination (it is also needed for the &Header issue), and then we could consider imbuing Unique with special semantics again.

This is mostly orthogonal to #326, which boils down to deciding whether we want to use Unique in standard library containers or not.

Miri now has optional support for making the Unique that is used in RawVec (and therefore Vec and VecDeque) actually meaningful. The semantics we chose for this are experimental -- we think they are sufficient to justify adding LLVM noalias for Unique arguments, but they are actually stronger than that (as in, have more UB) in ways we probably do not want.

Still, this lets us do some experiments. The good news is that the basic pattern of using a Vec for arena-like memory management is still permitted:

fn vec_push_ptr_stable() {
    let mut v = Vec::with_capacity(10);
    v.push(0);
    let v0 = unsafe { &mut *(&mut v[0] as *mut _) }; // laundering the lifetime -- we take care that `v` does not reallocate, so that's okay.
    v.push(1);
    *v0 = *v0;
}

This works since after v0 is created, no actual access to that memory is ever performed via another pointer. Unique aliasing is strictly lazy; only actually accessed memory is relevant, so there is no violation here.

The bad news is that in this prototype, even some safe code raises an error:

use std::cell::Cell;
use std::iter::FromIterator;

fn main() {
    let dummies = Vec::from_iter((0..2).map(|id| Cell::new(id)));
    let d = &dummies[0];
    d.set(1);
    for dummy in &dummies {
        dummy.get();
    }
    d.set(1);
}

This is due to the part I said above about the semantics having a lot more UB than LLVM noalias: when a function takes a Unique argument and returns some pointer derived from that, then we keep requiring uniqueness on that derived pointer. In contrast, from all I can tell, LLVM noalias stops having any effect at the end of the function. In the example above, d gets derived via a Unique, dummy gets derived separately, and so we don't allow mutation through both of them.

We probably want Unique to stop having aliasing force after the function they are passed to returns. (Basically, Unique is only relevant in combination with a protector.) Or maybe Unique needs a !Freeze exception?

Note: #326 (comment)

If neither Box<T> nor Vec<T> are unique in the aliasing model, then I don't think that Unique<T> has any reason to be either.

Sure, this entire discussion presupposes that we want some form of noalias around Vec. The question of whether we really want that is tracked in #326; let's not discuss this here. :)

Here's another example of surprising UB (this time with no Cell anywhere, so clearly not fixable with an interior mutability exception):

fn main() {
    let mut s = Vec::with_capacity(10);
    s.push(13);
    let ptr = s.as_mut_ptr();
    let ptr2 = s.as_mut_ptr();
    unsafe { ptr2.write(0) };
    unsafe { ptr.write(97) };
    println!("{s:?}");
}

The problem is that Unique::as_ptr takes the pointer by-value, creating a child in the tree. And since in Tree Borrows the child structure matters even after the function returns, this causes all sorts of conflicts.

So for Unique, if we want to give it any kind of semantics, it definitely should cease to be relevant when the function that Unique is passed to returns. This basically means Unique only has any effect when protected, making it a lot weaker than references. I'm honestly not sure if such a weak guarantee is worth having -- though it is still good enough to justify LLVM noalias so that could be something.

I should look into the newer scope-based noalias in LLVM that also works inside functions, to see if that could lead to a stronger Unique.

I'm honestly not sure if such a weak guarantee is worth having -- though it is still good enough to justify LLVM noalias so that could be something.

This seems to be adding to the arguments of "Just Don't".