servo/tendril

question: why is Buf32 #[repr(packed)]

Closed this issue · 11 comments

If I understand https://doc.rust-lang.org/nomicon/other-reprs.html#reprpacked correctly,
#[repr(packed)] forces byte alignment. And it sounds like this may be problematic:

most architectures strongly prefer values to be aligned.

I very new to Rust (and system level programming), but this sounds scary to me.

Could you explain, why the Buf32 should be packed?

#[repr(packed)]
pub struct Buf32<H> {
    pub ptr: *mut H,
    pub len: u32,
    pub cap: u32,
}

For Buf32 I think #[repr(packed)] may not be necessary. But it’s also not problematic, because the fields happen to not require any padding: laying them out in order without padding puts each field at an offset from the start of the struct that is a multiple of its size. (u32 fields are 32-bit-aligned.)

The Tendril struct also uses #[repr(packed)]. It is also OK there for the same reason. But it is critical there because, when a Tendril represents a string of 8 bytes or less, two consecutive 32-bit fields are treated like a single [u8; 8] buffer.

@kmcallister, could #[repr(packed)] be removed on Buf32?

bluss commented

Using packed doesn't seem correct or necessary. It does technically allow you to run into UB in Rust (see rust issue 27060). It turns the struct into a minimum-alignment-1 struct, which is indeed incorrect.

It seems redundant, unless you have some very special use of Buf32? With its fields being of size 8, 4, 4 bytes (64-bit) and 4, 4, 4 bytes (32-bit), the struct already doesn't use any padding at all on common platforms.

@bluss Per rust-lang/rust#27060, packed can only cause UB when it forces fields to be unaligned which it doesn’t for these specific structs, right?

See my comment above: I think packed is not necessary on Buf32 but it is on Tendril.

bluss commented

since align_of reports 1 for Buf32, it seems to me that it allows the fields to be unaligned at the whims of the compiler, it's instructed that the struct only needs byte alignment (which is not true).

bluss commented

Even worse, generic custom code might ask align_of and place the value depending on that reported alignment, then it's very easy to produce UB (technically, but it's hard to demonstrate since x86 allows it anyway..).

Ok, I’m pretty sure we can safely remove packed from Buf32, but again I’m more worried about Tendril and its unsafe code linked above.

bluss commented

I would use repr(C) instead, then fields are well aligned but you know that the fields are laid out in order, and then it's safe. I guess we are missing some representation alternatives here (fixed order but not C).

I suppose this unsafe code should probably use an union now that that exists, but we want this library to run on stable Rust.

So, yeah. In the meantime we should probably remove packed on Buf32 and replaced it with C on Tendril.

bluss commented

I'm looking into fixing Tendril and Buf32 by using repr(C). @SimonSapin, do you know what the Header struct is for?

Non-inline tendrils (any tendril with more than 8 bytes of data) have a heap allocated chuck of memory that starts with a Header and continues with a bytes buffer that stores the string’s data. The header contains a refcount (which needs to be shared for reference-counted tendrils that share a buffer) and the capacity (size of the allocation, it’s there to keep size_of::<Tendril<_>>() smaller).

#33 switches Tendril to #[repr(C)], but leaves #[repr(packed)] on Header to keep the C API working. (An to avoid wasting 4 bytes of padding between the header and data in heap allocations, though maybe that’s not significant.)