Lokathor/bytemuck

clarification on AnyBitPattern documentation

jgarvin opened this issue · 4 comments

The requirements for this is very similar to Pod, except that it doesn’t require that the type contains no uninit (or padding) bytes. This limits what you can do with a type of this kind, but also broadens the included types to repr(C) structs that contain padding as well as unions. Notably, you can only cast immutable references and owned values into AnyBitPattern types, not mutable references.

It's not clear to me why mutable references should care about uninitialized padding? I would just expect it to be ignored.

If you cast a &mut [u8] to &mut MyType and then assign a new value the padding bytes of MyType will become de-initialized during the assignment.

@Lokathor that's very surprising... I'm having trouble coming up with a mental model where that behavior makes sense. My thinking:

  • &mut [u8] already requires all the bytes of the slice be initialized
  • Once cast to &mut MyType all of its padding bytes will be initialized even though typically they wouldn't be required to be.
  • Once assigned to another instance, one of two things can happen: 1) the assignment avoids copying the padding bytes, in which case we still expect all bytes of the original slice to be initialized. 2) the assignment does copy the padding bytes, in which case rustc introduced UB I didn't ask for by touching those bytes to do the copy

So is the way that it works that uninitialized bytes have unknown value, copies include padding bytes, but padding byte copies done by rustc instead of the user are special and not UB, but also propagate the unknown'ness, and even though every bit pattern for u8 is valid, rustc/llvm want to strongly assume none of the bytes in a slice have been infected?

I must admit I'm not an expert but this is my current understanding:

Yes, assigning to &mut MyType also "assigns" all the padding bytes to be un-initialized, even if they were previously initialized. This allows the compiler to, for example, pick between an assignment being a field-by-field copy, or being a memcpy copy, or whatever other strategy.

Thanks it's surprising but it makes sense. I still think it might make sense in the docs to link to somewhere explaining this (I don't know if there's an authoritative place?) but I'll close for now. Thanks for the explanation 👍