rust-lang/rust

Tracking Issue for layout information behind pointers

CAD97 opened this issue · 14 comments

CAD97 commented

The feature gate for the issue is #![feature(layout_for_ptr)].

This tracks three functions:

  • core::mem::size_of_val_raw<T: ?Sized>(val: *const T) -> usize
  • core::mem::align_of_val_raw<T: ?Sized>(val: *const T) -> usize
  • core::alloc::Layout::for_value_raw<T: ?Sized>(t: *const T) -> Layout

These provide raw-pointer variants of the existing mem::size_of_val, mem::align_of_val, and Layout::for_value.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also uses as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Unresolved Questions

  • What should the exact safety requirements of these functions be? It is currently possible to create raw pointers that have metadata that would make size wrap via ptr::slice_from_raw_parts. Trait vtable pointers are currently required to always be valid, but this is not guaranteed and an open question whether this is required of invalid pointers.
  • How should this interact with extern types? As this is a new API surface, we could potentially handle them properly whereas the _of_val cannot. Additionally, extern types may want to access the value to determine the size (e.g. a null terminated cstr).

rust-lang/lang-team#166 is tangentially related, as it serves to document what requirements currently exist on pointee types and getting a known layout from them.

Implementation history

  • #69079 implemented these functions, including the intrinsic adjustments required to support them.

Potential bikeshed: maybe these methods should be moved to the ptr module instead of mem, where they can drop the _raw suffix. There already is some precedent for methods in these modules sharing names (e.g. ptr::swap versus mem:;swap) and I think it feels more natural.

The safety documentation on these functions is somewhat inaccurate. It states:

This function is only safe to call if the following conditions hold: [...]

  • If the unsized tail of T is:
    • a slice, then the length of the slice tail must be an initialized integer, and the size of the entire value (dynamic tail length + statically sized prefix) must fit in isize.

But the size of a custom slice DST is not necessarily the sum of the size of its prefix and the size of its slice tail. If the alignment of the prefix is greater than the alignment of the slice type, the compiler will insert additional padding following the slice, which is counted in the full DST size.

I stumbled on this issue when investigating custom repr(Rust) slice DSTs. Except when the slice type is a generic parameter subject to an unsizing coercion, such a DST cannot be constructed at all, since there is no sound way for users to query its layout. These functions nearly allow users to determine the size of such DSTs, but the safety requirements prevent this. We cannot directly get the layout of a null pointer with the desired length, since we have no way to determine the size of the prefix. We cannot even extrapolate from the layout of a 0-length DST pointer, since the compiler could add arbitrary padding once the length is increased. It would be nice if there were a fallible way to query the layout information, since this would trivially allow such DSTs to be allocated and initialized.

CAD97 commented

But the size of a custom slice DST is not necessarily the sum of the size of its prefix and the size of its slice tail. If the alignment of the prefix is greater than the alignment of the slice type, the compiler will insert additional padding following the slice, which is counted in the full DST size.

Well, the padding is statically sized, so in that sense, it's part of the statically sized prefix.

I also agree that there should be a fallible way to query layout information from pointer metadata, and have an experimental PR #95832 open to determine the cost of making size_of sound to call for arbitrarily sized slice tails (by saturating).

Well, the padding is statically sized, so in that sense, it's part of the statically sized prefix.

In what sense is it statically sized? Consider this test program (playground):

#![feature(layout_for_ptr)]

use std::{mem, ptr};

const PREFIX: usize = 4;

#[repr(C)]
struct DST {
    align: [u64; 0],
    prefix: [u8; PREFIX],
    slice: [u8],
}

fn main() {
    for len in 0..32 {
        let ptr = ptr::slice_from_raw_parts(ptr::null::<()>(), len);
        let ptr = ptr as *const DST;
        let size = unsafe { mem::size_of_val_raw(ptr) };
        print!("{} ", size - PREFIX - len);
    }
    println!();
}

You can adjust the size of the prefix. With a 4-byte prefix, this is the output:

4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 

These are the differences between the PREFIX + len and the total size for each len. Clearly, this padding is dependent on the length of the slice.

CAD97 commented

Ah, right, there is trailing padding to the alignment. I just read your comment wrong the first time.

And also, if it's `#[repr(Rust)], it's good to reiterate that there are formally no layout guarantees anyway, so the compiler is within its rights to do wackier things if it wanted to.

The trailing slice is always at the same offset.
#![feature(layout_for_ptr)]

use std::{mem, ptr};

const PREFIX: usize = 4;

#[repr(C)]
struct DST {
    align: [u64; 0],
    prefix: [u8; PREFIX],
    slice: [u8],
}

fn main() {
    for len in 0..32 {
        let ptr = ptr::slice_from_raw_parts(ptr::null::<()>(), len);
        let ptr = ptr as *const DST;
        let offset = unsafe {
            ptr::addr_of!((*ptr).slice)
                .cast::<u8>()
                .offset_from(ptr.cast())
        };
        print!("{offset} ");
    }
    println!();
}

prints all 4s.

You could argue that "dynamic tail length" includes the dynamic padding to alignment, or we could just add a "+ alignment padding" clause.

One way around the issue of #[repr(Rust)] with nongeneric slice tails not being usable is to always go through #[repr(C)] SliceTail<T, U> { prefix: T, tail: [U] } so you can rely on being able to determine/precalculate the layout, but this is obviously not ideal.

That makes sense wrt. repr(Rust); I was mainly surprised to learn that this trailing padding can exist also in repr(C). The safety comments on these functions are the only real mentions of an "unsized tail", so I'd assumed that the tail is placed fully after and independently of the prefix. Even in repr(Rust), every field in the prefix must have a fixed offset, so padding is the only thing that can occur after the tail.

CAD97 commented

Potential bikeshed: maybe these methods should be moved to the ptr module instead of mem, where they can drop the _raw suffix. [@clarfonthey]

What about Layout::for_value_raw? The feature is layout_for_ptr, and my original pre-PR draft used Layout::for_ptr, but that has a high chance of being misread as the layout to store the pointer, rather than for the pointee. Otherwise, I think I fully agree, but without a good name for the Layout function, keeping the parallel between of_val[_raw] and for_value[_raw] seems beneficial.

Perhaps it should be called Layout::for_value_raw_unchecked, so that a fallible version Layout::for_value_raw can be added later. (Although it is confusing to have an unchecked version without a checked version.)

We reviewed this in today's @rust-lang/lang meeting.

It seems like these fit into the general story of pointee metadata, and we should consider them together with that. They may potentially want to make use of that metadata rather than operating directly on the pointers. cc #81513, which tracks the more general question.

CAD97 commented

cc also rust-lang/lang-team#166 which includes an argument that it may make sense to restrict custom pointee layout to be calculable from the pointer metadata and not allow it to be address/pointee-sensitive.

Shouldn't these functions guarantee that they are safe to call at least when casting the pointer to a reference and calling size_of_val/align_of_val on that would be safe?

CAD97 commented

It's not explicitly stated, but it the case that if the pointer is valid to reborrow as a reference that these functions are safe to call. The listed conditions should be a proper subset of "pointer valid to reborrow."

@CAD97 they aren't really a proper subset, as they don't account for new unsized types that might be added to Rust in the future.

I made a PR to fix this: #103372