rust-lang/rust

Vec::push invalidates interior references even when it does not reallocate

RalfJung opened this issue · 15 comments

To my knowledge, the following code is intended to be legal:

fn main() {
    let mut v = Vec::with_capacity(10);
    v.push(0);
    let v0 = unsafe { &*(&v[0] as *const _) }; // laundering the lifetime -- we take care that `v` does not reallocate, so that's okay.
    v.push(1);
    let _val = *v0;
}

However, Miri currently flags this as UB. The reason is that Vec::push implicitly (through auto-deref) calls deref_mut to create a &mut [T] covering all elements in the vector, which overlaps with our reference v0 in the example code above. Overlapping shared and mutable references are prohibited by Stacked Borrows.

Specifically, code for push looks like this:

    pub fn push(&mut self, value: T) {
        if self.len == self.buf.cap() {
            self.reserve(1);
        }
        unsafe {
            let end = self.as_mut_ptr().add(self.len);
            ptr::write(end, value);
            self.len += 1;
        }
    }

and it is self.as_mut_ptr() line that causes problems: as_mut_ptr is a method on [T], so it's the implicit deref that creates a reference.

There's probably a bunch of other methods with similar problems. For example, fn remove(&mut self, index: usize) also invalidates all references, and not just the [index..] slice

Good point, I forgot to mention that the call is implicit. I amended my original post.

Vec doesn't seem to have a test suite. But if it did, would it be possible to run the standard library tests under miri?

We already do. :)

Specifically, this runs the #[test] functions in src/libcore/tests, src/liballoc/tests and src/liballoc.

I just noticed push has another, related problem: as mentioned above, the raw pointer it creates via self.as_mut_ptr() is created from an &[T]. This slice covers only the already existing elements of the vector, but the pointer arithmetic goes beyond the end of that slice! This is illegal, just like the following code is illegal:

fn main() { unsafe {
    let x: &[u8] = &[1,2,3];
    let x_01 = &x[0..2];
    let raw = x_01.as_ptr();
    let _val = raw.add(2).read(); // pointer is beyond the bounds of the slice it was created from!
} }

This does not currently get caught by Miri because of imprecision around our handling of raw pointers.

Making the motivating example legal seems to also make the following example legal (playground):

fn main() {
    let mut v = vec![1, 2, 3, 4, 5];
    v.pop();
    let ptr = v.as_mut_ptr();
    let ptr = unsafe { &mut *ptr };
    test(&mut v, ptr);
    println!("{:?}", v);    // [8, 2, 3, 4, 7]
}
fn test(v: &mut Vec<u32>, x: &mut u32) {
    v.push(7);
    *x = 8;
}

This looks very wrong for me. It also completely breaks Prusti… Isn't there another way to fix whatever code depends on this?

What do you mean by "it breaks Prusti"?

Our current encoding assumes that the parameters v and x cannot alias (we have v * x in separation logic). So, you either get a verification error or an unsoundness.

I think that model is incomplete, but that seems fine to me (unless you want to claim your tool is total). After all, one could also easily write code like

fn test(v: &mut *mut u32, x: mut u32) {
  unsafe { *v.add(1) = 1; }
  *x = 2;
}

and call this function with *v and x having the same value -- and if you are a bit careful with the way you create these pointers, that is totally fine:

let mut arr = [1u32, 2];
let v = arr.as_mut_ptr();
test(v, unsafe { &mut *v });

This is, conceptually, what happens in your example as well.

Put differently, in your example, main cannot call an arbitrary function test with the same type this way, that could go wrong badly. It has to know what test does for this to be sound. So it seems expected to me that the type signature of test alone is not sufficient to derive the precondition that main satisfies when calling test.

Essentially, test is what one may call a "super-safe" function: the actual precondition required to call test is weaker than what the types insinuate. (This is in contrast to unsafe functions whose actual precondition is stronger than what the types insinuate.) Code that wants to exploit this clearly requires custom annotations to inform the prover of this fact. This pattern can (and, I am sure, will) arise all over the place, so I don't think there is anything wrong with it also coming up around Vec.

(My example uses raw ptrs so it does not have a "super-safe" function; maybe that makes it a bad example.)

Just to avoid potential misunderstanding: here I am interested in the semantics of Vec. In other words, I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec. More specifically, Vec internally uses Unique, whose documentation says that:

Unlike *mut T, Unique<T> behaves "as if" it were an instance of T.

My mental image of this sentence is that I should treat the following struct Foo:

struct Foo {
    f: Unique<u32>,
}

as if it was:

struct Foo {
    f: u32,
}

And, therefore, be able to conclude that x and y do not alias:

fn bar(x: &mut Foo, y: &mut u32) { ... }

Does this make sense to you? Or maybe you see a flaw in my reasoning?

If I understand you correctly, super safe functions are the functions whose callers are allowed to call them with arguments that have broken invariants, right? My gut feeling says that such functions could result in all type invariants becoming public (I remember seeing this problem popping up with C++ friend functions), which makes reasoning really hard (not only for automatic verification but also for humans). Do you know whether anyone tried to incorporate visibility into the RustBelt proofs (prove that all code that needs to open a type invariant has access to all fields that are mentioned in it)?

By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec could be a good starting point.

Vec internally uses Unique

This is a good point. However, Unique is internal and unstable for a reason. We still need to figure out what exactly its rules are. The documentation you cite should likely be updated, but you are also quoting it out-of-context: reading the full text, I think what is meant by this is that for the purpose of the type system, Unique<T> behaves like T. For example, auto traits and dropck will all consider this type to contain owned T. But Unique<T> is certainly different from Box<T>, so this analogy does not carry over to ownership. (This becomes particularly clear when considering empty Vec.)

I think your test does not violate Unique since the unique pointer and x are never used to access the same memory. This matches the requirements LLVM imposes on noalias, which is what Unique is meant to model.

I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec.

I would phrase this slightly differently, namely -- these examples exploit deep knowledge of the invariant of Vec. They break the abstraction barrier, taking apart the ownership of the backing buffer. But they don't actually violate the underlying ownership discipline.

For most types, breaking the abstraction barrier like this would be considered a bug. The exact invariant is an internal implementation detail. Vec, however, is a bit special in that it quite explicitly exposes a lot of its implementation details in the documentation. Due to this, I think code like your example is justified.

By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec could be a good starting point.

I think Vec is actually meant to be used this way, so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.

Btw, discussions in closed issues are generally discouraged and easily get lost. You are raising some interesting questions though. Could you open a new issue and provide a short summary? Then we could invite some t-libs people to determine whether this is considered within the intended usage of Vec or not.

Sure. I will do that.

Edit: the explanation of why this is needed and allowed can be found in this comment.