Unitialized memory

Question

Unitialized memory

Closed this issue 4 years ago · 11 comments

Under what conditions is it valid to use any of these?

let x: T = std::mem::uninitialized(); on the stack
Box::new(std::mem::uninitialized())
The part of Vec between .len() and .capacity()
The memory pointed to by alloc::heap::allocate() without first writing to it
(Possibly many other similar cases…)

Let’s assume !Drop (or that we use std::mem::forget and are being very careful about panic-safety), and types (like u32) for which all bit patterns are valid.

Reading uninitialized memory is Bad and should be avoided, but what’s the worst that could happen? Undefined values might fine in many cases. Or is it Undefined Behavior of the “the optimizer is allowed to eat your lunch and elide half your program” kind?

As a concrete example, consider reallocate which when copying reads from the source pointer. That data is not necessarily entirely initialized: Vec::with_capacity(10).reserve(100)

Answer 1 · 2017-06-28T18:04:05.000Z

I would this one to your list:

Padding bytes between struct members

Ideally, it would be nice to ensure that simply moving/copying an uninitialized value is fine, as long as you don't "use" it. This includes passing it as a parameter to another function, again as long as that function doesn't "use" the value.

Answer 2 · 2017-06-28T18:09:55.000Z

What counts as using, then? Writing to a TCP socket for example is fundamentally a copy.

Answer 3 · 2017-06-28T19:09:00.000Z

Yeah, that would still be a copy, provided the socket is operating in cooked packet mode. In raw mode the OS will inspect at least the contents of the TCP header.

Answer 4 · 2017-07-11T00:48:11.000Z

So here's what miri implements; IMHO that's a good starting point and it should be mostly compatible with LLVM...

Every byte (we ignore bit-wise accesses for now; anyway Rust doesn't have bitfields) is either some value (0 <= x < 256) or "undefined" (or call it unitialized or whatever you like). Loading four undef bytes into a u32 in unsafe code is fine, that just makes the u32 itself "undefined". Same for storing them to memory. However, addition and any other operation is UB if any of the operands is undefined.

I invite you to play around with miri and run your toy examples though it; if you run into missing functionality, just report a bug. :) My goal is for miri to explicitly be a tool to test such questions; of course, there's still a long way to go.

Now, when we are talking abut safe code, I think this is related to #12 (comment). I would also like to propose that passing some safe external function that expects a u32 some "undef" value is UB; "undef" is not a valid inhabitant of u32. This is comparable to bool: In my proposal, storing an "invalid" value (say, 3) in a bool variable in unsafe code is NOT insta-UB as long as nobody uses that thing; however, a conditional branch on an invalid bool is UB. Still, a safe function can expect the bool it got as an argument to be valid, so the "contract" described by the type says that bool must be 0 or 1 and that u32 must be defined.

Answer 5 · 2017-07-11T05:01:03.000Z

However, addition and any other operation is UB if any of the operands is undefined.

AFAIK that's poison or stronger, not undef. Addition on undef soundly produces undef - you need to feed it into an operation that has any conditions of validity, or a conditional branch.
Unless you mean nsw (signed integers in C) addition, because with unknown inputs it could produce UB, so maybe it "always" does? But I'm not sure that's the stance LLVM takes.
See also https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html for the future of LLVM.

Answer 6 · 2017-07-11T05:44:09.000Z

I was referring to miri's undef, which indeed in LLVM is closest to posion.

Answer 7 · 2017-07-11T09:08:01.000Z

So it sounds like branching is key to triggering UB, but arithmetic "merely" propagates poisoned values?

in unsafe code […] passing some safe external function

I’m worried about this distinction. How is it defined? In an implementation of Vec for example there is plenty of code that is not directly in an unsafe {…} block or in an unsafe fn function or method, but is "unsafe" in the sense that it is responsible for maintaining some invariants.

Answer 8 · 2017-07-11T17:01:25.000Z

So it sounds like branching is key to triggering UB, but arithmetic "merely" propagates poisoned values?

That's a difference between LLVM poison and miri undef -- the latter is UB on arithmetic.

I’m worried about this distinction. How is it defined? In an implementation of Vec for example there is plenty of code that is not directly in an unsafe {…} block or in an unsafe fn function or method, but is "unsafe" in the sense that it is responsible for maintaining some invariants.

Good question. We haven't figured out all the details yet. Notice however that most of the time, thse functions assume additional invariants on top of what the type says, which would be fine with a model that checks if at least the normal type interpretation holds.

Answer 9 · 2017-07-14T22:58:41.000Z

Paraphrasing http://shape-of-code.coding-guidelines.com/2017/06/18/how-indeterminate-is-an-indeterminate-value/ for brevity:

memcpy could not be implemented in conforming C90 because copying structs with uninitialized padding was undefined behavior. C99 added wording so that uninitialized unsigned char is still indeterminate but could not have a "trap representation": reading it is not UB. Still, the value of uninitialized bytes is allowed change with each access, as if it were volatile: unsigned char x; return x ^ x; is not guaranteed to return zero. (XOR returns zero when its two arguments are equal.)

Answer 10 · 2017-07-15T21:11:56.000Z

@SimonSapin Instead of XOR one could also use x != x or x == x.

Answer 11 · 2021-11-29T01:24:12.000Z

The story of mem::uninitialized has progressed quite a lot in the last years (and that function has been deprecated and declared to be basically impossible to use correctly), so I will close this.