Vec::push invalidates interior references even when it does not reallocate
RalfJung opened this issue · 15 comments
To my knowledge, the following code is intended to be legal:
fn main() {
let mut v = Vec::with_capacity(10);
v.push(0);
let v0 = unsafe { &*(&v[0] as *const _) }; // laundering the lifetime -- we take care that `v` does not reallocate, so that's okay.
v.push(1);
let _val = *v0;
}
However, Miri currently flags this as UB. The reason is that Vec::push
implicitly (through auto-deref) calls deref_mut
to create a &mut [T]
covering all elements in the vector, which overlaps with our reference v0
in the example code above. Overlapping shared and mutable references are prohibited by Stacked Borrows.
Specifically, code for push
looks like this:
pub fn push(&mut self, value: T) {
if self.len == self.buf.cap() {
self.reserve(1);
}
unsafe {
let end = self.as_mut_ptr().add(self.len);
ptr::write(end, value);
self.len += 1;
}
}
and it is self.as_mut_ptr()
line that causes problems: as_mut_ptr
is a method on [T]
, so it's the implicit deref that creates a reference.
There's probably a bunch of other methods with similar problems. For example, fn remove(&mut self, index: usize)
also invalidates all references, and not just the [index..]
slice
Good point, I forgot to mention that the call is implicit. I amended my original post.
would it be possible to run the standard library tests under Vec
doesn't seem to have a test suite. But if it did,miri
?
We already do. :)
Specifically, this runs the #[test]
functions in src/libcore/tests, src/liballoc/tests and src/liballoc.
I just noticed push
has another, related problem: as mentioned above, the raw pointer it creates via self.as_mut_ptr()
is created from an &[T]
. This slice covers only the already existing elements of the vector, but the pointer arithmetic goes beyond the end of that slice! This is illegal, just like the following code is illegal:
fn main() { unsafe {
let x: &[u8] = &[1,2,3];
let x_01 = &x[0..2];
let raw = x_01.as_ptr();
let _val = raw.add(2).read(); // pointer is beyond the bounds of the slice it was created from!
} }
This does not currently get caught by Miri because of imprecision around our handling of raw pointers.
Making the motivating example legal seems to also make the following example legal (playground):
fn main() {
let mut v = vec![1, 2, 3, 4, 5];
v.pop();
let ptr = v.as_mut_ptr();
let ptr = unsafe { &mut *ptr };
test(&mut v, ptr);
println!("{:?}", v); // [8, 2, 3, 4, 7]
}
fn test(v: &mut Vec<u32>, x: &mut u32) {
v.push(7);
*x = 8;
}
This looks very wrong for me. It also completely breaks Prusti… Isn't there another way to fix whatever code depends on this?
What do you mean by "it breaks Prusti"?
Our current encoding assumes that the parameters v
and x
cannot alias (we have v * x
in separation logic). So, you either get a verification error or an unsoundness.
I think that model is incomplete, but that seems fine to me (unless you want to claim your tool is total). After all, one could also easily write code like
fn test(v: &mut *mut u32, x: mut u32) {
unsafe { *v.add(1) = 1; }
*x = 2;
}
and call this function with *v
and x
having the same value -- and if you are a bit careful with the way you create these pointers, that is totally fine:
let mut arr = [1u32, 2];
let v = arr.as_mut_ptr();
test(v, unsafe { &mut *v });
This is, conceptually, what happens in your example as well.
Put differently, in your example, main
cannot call an arbitrary function test
with the same type this way, that could go wrong badly. It has to know what test
does for this to be sound. So it seems expected to me that the type signature of test
alone is not sufficient to derive the precondition that main
satisfies when calling test
.
Essentially, test
is what one may call a "super-safe" function: the actual precondition required to call test
is weaker than what the types insinuate. (This is in contrast to unsafe functions whose actual precondition is stronger than what the types insinuate.) Code that wants to exploit this clearly requires custom annotations to inform the prover of this fact. This pattern can (and, I am sure, will) arise all over the place, so I don't think there is anything wrong with it also coming up around Vec
.
(My example uses raw ptrs so it does not have a "super-safe" function; maybe that makes it a bad example.)
Just to avoid potential misunderstanding: here I am interested in the semantics of Vec
. In other words, I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec
. More specifically, Vec
internally uses Unique
, whose documentation says that:
Unlike
*mut T
,Unique<T>
behaves "as if" it were an instance ofT
.
My mental image of this sentence is that I should treat the following struct Foo
:
struct Foo {
f: Unique<u32>,
}
as if it was:
struct Foo {
f: u32,
}
And, therefore, be able to conclude that x
and y
do not alias:
fn bar(x: &mut Foo, y: &mut u32) { ... }
Does this make sense to you? Or maybe you see a flaw in my reasoning?
If I understand you correctly, super safe functions are the functions whose callers are allowed to call them with arguments that have broken invariants, right? My gut feeling says that such functions could result in all type invariants becoming public (I remember seeing this problem popping up with C++ friend functions), which makes reasoning really hard (not only for automatic verification but also for humans). Do you know whether anyone tried to incorporate visibility into the RustBelt proofs (prove that all code that needs to open a type invariant has access to all fields that are mentioned in it)?
By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec
as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec
could be a good starting point.
Vec internally uses Unique
This is a good point. However, Unique
is internal and unstable for a reason. We still need to figure out what exactly its rules are. The documentation you cite should likely be updated, but you are also quoting it out-of-context: reading the full text, I think what is meant by this is that for the purpose of the type system, Unique<T>
behaves like T
. For example, auto traits and dropck will all consider this type to contain owned T
. But Unique<T>
is certainly different from Box<T>
, so this analogy does not carry over to ownership. (This becomes particularly clear when considering empty Vec
.)
I think your test
does not violate Unique
since the unique pointer and x
are never used to access the same memory. This matches the requirements LLVM imposes on noalias
, which is what Unique
is meant to model.
I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec.
I would phrase this slightly differently, namely -- these examples exploit deep knowledge of the invariant of Vec
. They break the abstraction barrier, taking apart the ownership of the backing buffer. But they don't actually violate the underlying ownership discipline.
For most types, breaking the abstraction barrier like this would be considered a bug. The exact invariant is an internal implementation detail. Vec
, however, is a bit special in that it quite explicitly exposes a lot of its implementation details in the documentation. Due to this, I think code like your example is justified.
By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec could be a good starting point.
I think Vec
is actually meant to be used this way, so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.
Btw, discussions in closed issues are generally discouraged and easily get lost. You are raising some interesting questions though. Could you open a new issue and provide a short summary? Then we could invite some t-libs people to determine whether this is considered within the intended usage of Vec
or not.
Sure. I will do that.
Edit: the explanation of why this is needed and allowed can be found in this comment.