Tracking Issue for pointer metadata APIs
KodrAus opened this issue · 171 comments
This is a tracking issue for the RFC 2580 "Pointer metadata & VTable" (rust-lang/rfcs#2580).
The feature gate for the issue is #![feature(ptr_metadata)]
.
About tracking issues
Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
- Implement the RFC (cc @rust-lang/libs @rust-lang/lang -- can anyone write up mentoring
instructions?) - Adjust documentation (see instructions on rustc-dev-guide)
- Stabilization PR (see instructions on rustc-dev-guide)
Unresolved Questions
Language-level:
- Is it, or should it be UB (through validity or safety invariants) to have a raw trait object wide pointer with an dangling vtable pointer? A null vtable pointer? If not,
DynMetadata
methods likesize
may need to beunsafe fn
. Or maybe something like*const ()
should be metadata of trait objects instead ofDynMetadata
.
Right now, there is some inconsistency here:size_of_val_raw(ptr)
is unsafe, butmetadta(ptr).size_of()
does the same thing and is safe.
Update (2024-10-04): It is definitely the case that the safety invariant for raw trait objects requires a valid vtable. Sometadta(ptr).size_of()
being safe is fine.size_of_val_raw(ptr)
must be unsafe because of slices, so there is no inconsistency here. - should
Metadata
be required to beFreeze
API level:
- Is
*const ()
appropriate for the data component of pointers? Or should it be*const u8
? Or*const Opaque
with some newOpaque
type? (Respectively*mut ()
andNonNull<()>
) - Should
ptr::from_raw_parts
and friends beunsafe fn
? - Should
Thin
be added as a supertrait ofSized
? Or could it ever make sense to have fat pointers to statically-sized types? - Should
DynMetadata
not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small. - rust-lang/libs-team#246
DynMetadata::size_of
does not always return the same value assize_of_val
since the former only reads the size from the vtable, but the latter computes the size of the entire type. That seems like a pretty bad footgun?
API bikesheds:
- Name of new items:
Pointee
(v.s. Referent?),Thin
(ThinPointee
?),DynMetadata
(VTablePtr
?), etc - Location of new items in
core::ptr
. For example: shouldThin
be incore::marker
instead?
Implementation history
- #81172 Initial implementation
Tracked APIs
Last updated for #81172.
pub trait Pointee {
/// One of `()`, `usize`, or `DynMetadata<dyn SomeTrait>`
type Metadata;
}
pub trait Thin = Pointee<Metadata = ()>;
pub const fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {}
pub const fn from_raw_parts<T: ?Sized>(*const (), <T as Pointee>::Metadata) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), <T as Pointee>::Metadata) -> *mut T {}
impl<T: ?Sized> NonNull<T> {
pub const fn from_raw_parts(NonNull<()>, <T as Pointee>::Metadata) -> NonNull<T> {}
/// Convenience for `(ptr.cast(), metadata(ptr))`
pub const fn to_raw_parts(self) -> (NonNull<()>, <T as Pointee>::Metadata) {}
}
impl<T: ?Sized> *const T {
pub const fn to_raw_parts(self) -> (*const (), <T as Pointee>::Metadata) {}
}
impl<T: ?Sized> *mut T {
pub const fn to_raw_parts(self) -> (*mut (), <T as Pointee>::Metadata) {}
}
/// `<dyn SomeTrait as Pointee>::Metadata == DynMetadata<dyn SomeTrait>`
pub struct DynMetadata<Dyn: ?Sized> {
// Private pointer to vtable
}
impl<Dyn: ?Sized> DynMetadata<Dyn> {
pub fn size_of(self) -> usize {}
pub fn align_of(self) -> usize {}
pub fn layout(self) -> crate::alloc::Layout {}
}
unsafe impl<Dyn: ?Sized> Send for DynMetadata<Dyn> {}
unsafe impl<Dyn: ?Sized> Sync for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Debug for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Unpin for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Copy for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Clone for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Eq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialEq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Ord for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialOrd for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Hash for DynMetadata<Dyn> {}
After experimenting with custom implementations of Box
, I think there is a strong case for having strongly typed meta-data for all kinds of pointers.
The pre-allocator representation of Box
is:
struct Box<T: ?Sized> { ptr: NonNull<T>, }
The post-allocator representation is very similar:
struct Box<T: ?Sized, A: Allocator = Global> {
allocator: A,
ptr: NonNull<T>,
}
Both automatically implements CoerceUnsized<Box<U>> where T: Unsize<U>
, and all is well.
If one wants to make Box generic over its storage, then the representation becomes:
pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
storage: S,
handle: S::Handle<T>,
}
If S::Handle<T> == NonNull<T>
, then Box is still coercible; however, in the case of inline storage, that is:
- neither possible: when the
Box
is moved, so is the storage, and therefore any pointer into the storage is invalidated. - nor desirable: in the case of inline storage, the pointer is redundant, wasting 8 bytes.
Hence, in the case of inline storage, S::Handle<T>
is best defined as <T as Pointee>::Metadata
.
In order to have Box<T> : CoerceUnsized<Box<U>> where T: Unsize<U>
:
- We need:
S::Handle<T>: CoerceUnsized<S::Handle<U>> where T: Unsize<U>
, - Which means:
<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>
.
And of course, Box being coercible is very much desirable.
As a result, I believe a slight change of course is necessary:
- All metadata should be strongly typed -- be it
Metadata<dyn Debug>
,Metadata<[u8]>
orMetadata<[u8; 3]>
-- no more()
orusize
. - The compiler should automatically implement
Metadata<T>: CoerceUnsized<Metadata<U>> where T: Unsize<U>
.
I would note that having a single Metadata<T>
type rather than SizedMetadata<T>
, SliceMetadata<[T]>
, DynMetadata<dyn T>
is not necessary, only the coercion is, and since the compiler is generating those, it's perfectly free to create them "cross type". I just used the same name as a short-cut.
Addendum: What's all that jazz about inline storage?
At a high-level, Box
is not so much about where memory comes from, it's a container which allows:
- Dynamically Sized Types.
- And therefore Type Erasure.
Having the memory inline in the Box
type preserves those 2 key properties whilst offering a self-contained type (not tied to any lifetime, nor any thread). It's allocation-less type-erasure.
A motivating example is therefore fn foo<T>() -> Box<dyn Future<T>, SomeInlineStorage>
: it returns a stack-allocated container which contains any future type (fitting in the storage) which can evaluate to T
.
Box<dyn Future<T>, SomeInlineStorage>
would have to be dynamically-sized itself, right? So in order to manipulate it without another lifetime or heap-allocated indirection you’d need the unsized locals language feature. And if you have that you can manipulate dyn Future<T>
directly, so what’s the point of a box with inline storage?
IMO this is different from the case of Vec
, which provides useful functionality on top of its storage so that ArrayVec
(a.k.a. Vec
with inline storage) makes sens. But Box
pretty much is its storage.
Box<dyn Future<T>, SomeInlineStorage>
would have to be dynamically-sized itself, right?
No, that's the whole point of it actually.
In C++, you have std::string
and std::function
implementation typically using the "short string optimization", that is a sufficiently small payload is just embedded inside, and larger ones require a heap-allocation.
This is exactly the same principle:
- libstdc++'s
std::string
can contain up to 15 non-NUL characters without heap allocation on 64-bits platform. sizeof(std::string) == 24
, regardless of whether it's empty, contains a single character, or contains 15.
So, here, SomeInlineStorage
is generally speak over-reserving. You settle on a fixed alignment and size, and then you may get mem::size_of::<Box<dyn Future, SomeInlineStorage>>() == 128
regardless of what's stored inside.
If you stored a single pointer (+v-table), well, you're paying a bit too much, but that's the price for flexibility. It's up to you size it appropriately for the largest variant.
In any case, unsized locals is strictly unnecessary, as can be seen in the tests of storage-poc's RawBox
.
Oh I see, so this is more like SmallVec
than ArrayVec
and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?
Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized
discussion. But could you manage what you want if the handle for storage-generic Box<T>
is not T::Metadata
directly but another generic struct that contains that together with PhandomData<T>
?
Oh I see, so this is more like
SmallVec
thanArrayVec
and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?
It's up to you, you can have either a purely inline storage, or you can have "small" inline storage with heap fallback.
The main point is that the "inline" portion is always of fixed size and alignment (up to the storage) and therefore RawBox
itself is always Sized
.
(You can an equivalent of ArrayVec
instantiated in this test-suite: RawVec<T, inline::SingleRange<...>>
)
Back to pointer metadata though, I have a bit of a hard time following the
CoerceUnsized
discussion. But could you manage what you want if the handle for storage-genericBox<T>
is notT::Metadata
directly but another generic struct that contains that together withPhandomData<T>
?
I don't think so, given the language from the documentation of CoerceUnsized
:
For custom types, the coercion here works by coercing
Foo<T>
toFoo<U>
provided an impl ofCoerceUnsized<Foo<U>>
forFoo<T>
exists.Such an impl can only be written if
Foo<T>
has only a single non-phantomdata field involvingT
.If the type of that field is
Bar<T>
, an implementation ofCoerceUnsized<Bar<U>>
forBar<T>
must exist. The coercion will work by coercing theBar<T>
field intoBar<U>
and filling in the rest of the fields fromFoo<T>
to create aFoo<U>
. This will effectively drill down to a pointer field and coerce that.
It appears that PhantomData
fields are ignored for the purpose of coercion.
Note how a SliceLen<T>
would be the ideal metadata for a [T]
slice, as it could express the fact that the range of valid lengths for a slice reference depends on the size of T
.
However, as there's quite a few ways of manipulating slice pointers without unsafe
, eg via ptr::slice_from_raw_parts
, I don't know if such types can actually enforce all that much at compile time.
there's quite a few ways of manipulating slice pointers without
unsafe
, eg viaptr::slice_from_raw_parts
Yes that’s the idea behind this RFC: generalize slice_from_raw_parts
to other kinds of DSTs
So a thing that seems to be missing here is a stable layout for DynMetadata
itself.
A really annoying thing is that currently you cannot opaquely pass trait objects across FFI without doing a second allocation, because Box<dyn Trait>
has unknown layout. Are there plans to make this feasible? To me this has been the main use case for work on DST APIs
Would it make sense to add conversions between DynMetada
and some raw pointer type? Would that help the FFI use case?
Yes, it would. It would be annoying to use, but it would suffice.
Doesn't even need to be a pointer type, jsut an opaque type with a well-defined layout. Though pointers makes it easier for other tools to use, definitely.
We could document that DynMetadata
itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)
We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)
That would be nice, and it would be nice if it had explicit conversion functions to *const !
*const ()
, sure, why not.
A *const !
pointer though would always be UB to dereference. So it… encodes in the type system that it is always dangling? That doesn’t seem a good fit for vtables.
That works too yeah
It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:
pub fn make_weird_raw_ptr() -> *const dyn Send {
unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}
This RFC is a breaking change in that it makes the above function unsound:
let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
let size = meta.size(); // *oops* UB
At the very least, this should be listed as an open concern to be resolved.
Maybe metadata
should only be safe on references, not raw pointers?
Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.
Don't size and align already have to be in the same location? Certainly Miri assumes this in its implementations of size_of_val
and align_of_val
for trait objects -- and I don't see a way to implement this without that information being at a consistent location.
For drop
, I don't understand why it is mentioned here as also having to be in the same location.
@RalfJung Yes, this is an important point. It came up in RFC discussions but I forgot to incorporate it in unresolved questions then. I’ve done so in the issue description here.
My understanding was that make_weird_raw_ptr
is sound in current compilers but that related language rules are still mostly undecided. Has that changed?
Don't size and align already have to be in the same location?
I’m also skeptical that it could be any other way, this point is mostly copied from a comment on the RFC
Maybe
metadata
should only be safe on references, not raw pointers?
I think that would be a serious limitation. I don’t see a reason extracting the components of any raw pointer and putting them back together shouldn’t be sound.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada
methods like size
could be made unsafe fn
s.
I’ve done so in the issue description here.
Thanks. Notice however that for this RFC to work, it is not required to make "valid vtable" part of the validity invariant. Making it part of the safety invariant would be sufficient, since this is a library-level concern.
My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?
Indeed, neither the validity invariant nor the safety invariant of raw pointers are pinned down exactly. I think it is safe to say though that people would expect these invaraints to be as weak as is at all possible, i.e., to require as little as possible. Even the fact that the vtable pointer is non-NULL is tripping people up and we might want to change that.
size_of_val_raw
and similar APIs are unsafe fn
for this exact reason.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.
That would be another option, yes. (There could also be SafeDynMetadata
as the metadata of dyn Trait
references where these methods could still be safe.)
There could also be
SafeDynMetadata
as the metadata ofdyn Trait
references where these methods could still be safe.
This would require a design different from the current Pointee
trait where the metadata type <dyn Trait as Pointee>::Metadata
is independent of the kind of pointer/reference.
Is there a reason DynMetadata::drop_in_place
isn't exposed similar to how DynMetadata::size_of
and DynMetadata::align_of
is?
This would be useful for a type-erased ThinBox.
You’d need a data pointer anyway to call it, and if you have that you can use ptr::from_raw_parts_mut
and then ptr::drop_in_place
.
Oh you're right somehow I didn't think of that.
Ah wait. That wouldn't work for a type-erased ThinBox because ptr::from_raw_parts_mut
is generic over T
be we don't know what T
is anymore.
For a "real world" example: RustPython makes their own vtable which stores the T
's drop_in_place
inline before erasing T
.
See pyobject.rs
Let’s say you have impl SomeTrait for SomeStruct
. You’d use from_raw_parts_mut::<dyn SomeTrait>
, that is from_raw_parts_mut::<T>
with T = dyn SomeTrait
. Your code before type erasure may have another type variable T = SomeStruct
that happens to share the same name T
, but the two variables don’t refer to the same type.
Similarly you don’t need to call drop_in_place::<SomeStruct>
directly, instead then use drop_in_place::<dyn SomeTrait>
which takes care of finding the destructor function pointer in the vtable.
You're right again. Thanks!
I’ve added an unresolved question the use of *const ()
for the data component
Maybe *const !
? It's still unstable though.
I have a slight preference for *const u8
since this is what we use for opaque pointers in the allocator API.
Not sure if this was mentioned yet, but Pointee
looks a lot like a marker trait, and traditionally we'd put them under the marker
module, rather than others. For example DiscriminantKind
. To me the fact that all types implement Pointee
suggests that it also should be in marker
rather than ptr
.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.
That would be another option, yes. (There could also be
SafeDynMetadata
as the metadata ofdyn Trait
references where these methods could still be safe.)
How about have:
pub const fn from_raw_parts<T: ?Sized>(*const (), MaybeUninit<<T as Pointee>::Metadata>) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), MaybeUninit<<T as Pointee>::Metadata>) -> *mut T {}
impl<T: ?Sized> *const T {
pub const fn to_raw_parts(self) -> (*const (), MaybeUninit<<T as Pointee>::Metadata>) {}
}
impl<T: ?Sized> *mut T {
pub const fn to_raw_parts(self) -> (*mut (), MaybeUninit<<T as Pointee>::Metadata>) {}
}
@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?
@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?
There are MaybeUninit<>
around <T as Pointee>::Metadata
, so we don't have to place any safety invariants on raw pointers.
This way @RalfJung's make_weird_raw_ptr
example would still be sound:
pub fn make_weird_raw_ptr() -> *const dyn Send {
unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}
let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
// this can't compile now without unsafe `assume_init`
// let size = meta.size();
I’ve added an unresolved question the use of *const () for the data component
Would it be possible to have *const T
be the data component for this pointers and slice pointers and *const ()
/*const !
/*const u8
/other the data component for trait objects?
That would require another automagic associated type like Metadata
. It’s possible but not simple.
Regarding *const !
specifically, I don’t think it would be appropriate here. There can be no value of type !
, so a *const !
has the semantics of never point to a valid value. However we want the data pointer of &dyn SomeTrait
to be very much pointing to a valid value, just one of specified type and size.
I realized that the question I raised about having <T as Pointee>::Metadata
being a strong type (and kinda forgot about, sorry) never quite got an answer. For reference #81513 (comment) .
Summing it up: when experimenting with the Storage API, it came up that in order for Box<T>
to be coercible into Box<U>
, I needed:
<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>
It is not clear whether this is the best way to achieve the functionality above (it's not even clear if there's any other); still, it raises the specter that such implementation may be necessary, and as far as I can see this would require ensuring that <T as Pointee>::Metadata
is always a strong type such as Metadata<T>
and not ()
or usize
as it is currently for Sized types and slice types respectively.
Since "downgrading" Metadata<T>
to an alias for ()
or usize
is easier than upgrading ()
or usize
to Metadata<T>
down the line for backward compatibility reasons. It seems conservative to ensure that <T as Pointee>::Metadata
is a strong type, one for which traits can be implemented.
So:
- Should
<T as Pointee>::Metadata
be a strong type? - Are there concerns that this being a strong type may cause issues? (Compile time? Bloat?)
Since "downgrading"
Metadata<T>
to an alias for()
orusize
is easier
Nope, that’s also a breaking change. A downstream crate might have separate impl
s for one of their trait. If the two types are later unified, those impls start colliding.
As implemented by #81172, DynMetadata
is not repr(C)
? Are we not supposed to pass this type through FFI?
There is no guarantee about what DynMetadata
contains. Currently it is a single pointer to a vtable, but a single pointer per trait of a trait object is currently also a valid implementation strategy. It probably won't happen, but this was one of the suggested implementation methods to allow upcasting dyn TraitA + TraitB
to dyn TraitA
.
Is this feature not intended to provide a better FFI experience, i.e. we don't have to transmute to/from TraitObject? If so, what should I do when I need to pass a dyn pointer to/from FFI using the new interface? Should I transmute the DynMetadata
, or put it in a Box and pass the pointer?
The RFC itself suggests putting DynMetadata
in a repr(C)
: https://github.com/rust-lang/rfcs/pull/2580/files#diff-596b2a682c8f7063a250fdf3a1541d6249c84f01eed706d2b631a3a754a3ba66R95
Only adding repr(C)
to DynMetadata
would not help with FFI (or any other use case). Since the fields of DynMetadata
are private there is no stability guarantee about them. Like bjorn3 mentioned it could be more than one pointer for some dyn
types in the future.
For your FFI use case we would need the Language Team to decide that DynMetadata
will always be a single pointer (at least for some dyn
types), then the standard library to provide conversion to/from something like *const ()
.
The RFC itself suggests putting
DynMetadata
in arepr(C)
:
A struct being repr(C)
(in this case WithMeta
) does not say anything about the repr
of the types of its fields. For example this compiles without any warning: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dbb86dae265ad1938d4002b2d7a2857a
#[repr(C)]
pub struct Foo {
a: Bar,
b: Bar,
}
struct Bar(u8, u32, u8); // likely reordered
A struct being repr(C) ...
While being technically true, I assumed by putting repr(C)
on a type the RFC expresses the intention of using that type for FFI. Having a FFI-unsafe field type in a repr(C)
type causes the outer type to be FFI-unsafe too (or so the compiler warns me), kind of destroy the purpose of putting repr(C)
. Of course my assumption could be wrong.
Having written the ThinBox
example you linked, I can confirm that the intention there was to use repr(C)
in order to disable potential field re-ordering, so that the unsafe
code dereferencing a *mut WithMeta<()>
that was initialized as WithMeta<S>
stays valid. In that example, WithMeta
is a private implementation detail of the ThinBox
library and is unrelated to FFI.
What about instead of having <dyn Trait as Pointee>::Metadata
be DynMetadata
storing a vtable pointer, it's instead *const DynMetadata
pointing to the vtable itself? It would always require unsafe to access dyn metadata, but you need that unsafe
somewhere to get dyn metadata from *const dyn Trait
.
That would prevent the trait object metadata from ever being more than a single pointer. (eg one for each trait vtable) Also accessing the vtables isn't just unsafe, you are depending on implementation details, so it may break at any time.
Should there be a direct API to construct a DynMetadata without specific pointers?
Much like:
impl<Dyn: ?Sized> DynMetadata<Dyn> {
pub fn new<T>() -> Self where T: Dyn {}
}
I don't know much about the implementation details, but I think its safe to some degree because DynMetadata doesn't store information about the content or the address of the raw pointer.
If such API exists, we'll be able to manually construct DST pointers, which sounds great to me.
T: Dyn
wouldn't work; Dyn
is a type not a trait here.
On yeah that's true, but I hope there's some other way to do the stuff though. :)
This works today:
#![feature(const_fn_trait_bound)]
#![feature(unsize)]
#![feature(ptr_metadata)]
use core::marker::Unsize;
use core::ptr::DynMetadata;
use core::ptr::Pointee;
const fn new_metadata<Dyn: ?Sized, T>() -> DynMetadata<Dyn>
where
T: Unsize<Dyn>,
Dyn: Pointee<Metadata = DynMetadata<Dyn>>,
{
(core::ptr::null::<T>() as *const Dyn).to_raw_parts().1
}
fn main() {
println!("{:?}", new_metadata::<dyn core::any::Any, i32>());
}
There needs to be some pointer or reference that is "unsized" to a trait object at some point in order for the appropriate vtable to be generated. I think that can be a raw pointer with null data component: std::ptr::metadata::<dyn Trait>(std::ptr::null<T>())
What if the Metadata
for trait objects were MaybeUninit<DynMetadata<dyn Trait>>
? That way in order to use it you would have to first go through an unsafe assume_init
while still allowing safe access to raw slice length data.
How should the code determine if it is ok to .assume_init()
? If there is an automated way to do so, I think Option<DynMetadata<dyn Trait>>
might be a better idea (if this is still insufficient, maybe something more opaque should be returned which allows inspection, although DynMetadata
should already have that "role")
How should the code determine if it is ok to
.assume_init()
?
You shouldn't be able to for raw pointers - simple as that. The information necessary to do so isn't available at runtime since the underlying type of the trait object is not known. The state of thin raw pointers in Rust is that it is always safe to construct one, but in order to use/dereference it, you need to have unsafe
somewhere to assert that the invariants of the type are being upheld (the simplest being "this has to be a valid memory location", but this includes other rules like "is the bit pattern at this location valid for type T
"). When you call the safe methods on DynMetadata
, you're implicitly asserting that the DynMetadata
is valid by dereferencing the internal pointer, when it should be explicit because it make invoke UB.
By calling assume_init
, you declare that you're upholding the invariants of DynMetadata
and as such its safe methods can be called. Otherwise, you can cause UB based on whether the raw pointer is legit as shown in #81513 (comment). With the change, it'd end up looking like:
let ptr: *const dyn Send = make_weird_raw_ptr();
let meta: MaybeUninit<DynMetadata<dyn Send>> = metadata(ptr);
// I'm asserting that the `DynMetadata` is valid, but I violated the invariant that it's derived from valid metadata!
// This causes UB, but that's expected since I didn't uphold the invariants of unsafe code.
let size = unsafe { meta.assume_init() }.size();
That invariant is currently phrased in ptr::from_raw_parts
as:
For trait objects, the metadata must come from a pointer to the same underlying erased type.
This is what states, but there is no unsafe
to for the programmer to assert this invariant anywhere, which breaks a common assumption Rust programmers make about raw pointers: safe operations won't dereference them without some declaration or check that invariants are upheld. I personally like this language a bit better:
For trait object pointers to
dyn Trait
with an underlying typeT
, the metadata must have been derived from a validdyn Trait
reference of the same underlying type, such as what is returned byptr::metadata(&T as &dyn Trait)
.
I believe that should be sufficient, since you can't create a raw pointer to a dyn Trait
without first going through an intermediate reference, which ensures the vtable exists. This is a more strict version, requiring that vtables can't ever be duplicated, which we may not want to guarantee:
Given any two wide pointers to trait objects
dyn Trait
of the same underlying type, theDynMetadata<dyn Trait>
metadata must be identical.
The only way to assert that the metadata is valid automatically is if it's a reference, since that's an invariant of references. This is why @RalfJung suggested having a safe version of from_raw_parts
for references and I presume an unsafe version for raw pointers. That may end up being the most tractable option.
Anyways, this is a long way to say that raw pointers, wide and thin, shouldn't have any invariants on them - that's why you use raw pointers after all! It's when using the pointers where the invariants should kick in, and what the assume_init
is meant to represent.
This is what states, but there is no unsafe to for the programmer to assert this invariant anywhere
Well, there is -- when dereferencing the raw pointer.
Anyways, this is a long way to say that raw pointers, wide and thin, shouldn't have any invariants on them - that's why you use raw pointers after all! It's when using the pointers where the invariants should kick in, and what the assume_init is meant to represent.
There IMO should be one invariant even on thin raw pointers: they must not be uninit. IOW, I think that MaybeUninit::<*const u8>::uninit().assume_init()
should be UB (and similar for uninit integers). Also see rust-lang/unsafe-code-guidelines#71.
So I don't think using MaybeUninit
for the metadata is a good idea, since I don't think we want to allow literally uninit memory in metadata.
This is what states, but there is no unsafe to for the programmer to assert this invariant anywhere
Well, there is -- when dereferencing the raw pointer.
Yes, but dereferencing a raw pointer requires an unsafe
block. metadata(dyn_ptr).layout()
as-is does not, and that's the problem.
So I don't think using
MaybeUninit
for the metadata is a good idea, since I don't think we want to allow literally uninit memory in metadata.
I agree. I initially chose MaybeUninit<T>
because it is a common way to represent a T
that may have an invalid bitfield that can be asserted valid through an unsafe
gate, like Pin
has with its Drop
guarantee. Of course, it's actually the canonical way to represent uninitialized data and the typed way to represent that safely. Especially because it'd be in the standard library, it would imply that raw wide pointers may contain uninitialized data, and they should not be able to, as integer-represented types.
Should there just be a wrapper type then? <dyn Trait as Pointee>::Metadata
then becomes:
#[derive(Clone, Copy, Debug, Eq, Hash, Ord, PartialOrd)]
pub struct RawDynMetadata<T: ?Sized>(DynMetadata<T>);
impl<T: ?Sized> RawDynMetadata<T> {
// or some better name
/// # Safety
/// - `T` must be a trait object.
/// - This `RawDynMetadata` must have been derived from the metadata of a valid reference to `T`.
pub unsafe fn assume_valid(self) -> DynMetadata<T> {
self.0
}
}
A nice advantage is that is it still safe for slice metadata on raw pointers, especially since we don't have #71146 yet. One can also already safely store and use the accessible metadata of a trait object reference with Layout::for_value
.
Yes, but dereferencing a raw pointer requires an unsafe block. metadata(dyn_ptr).layout() as-is does not, and that's the problem.
Could you specify what exactly the problem is?
The current API is sound, i.e., you will (to my knowledge) not get UB here without using unsafe code. So it must be some other property you are looking for that is violated.
However, there is indeed an inconsistency here in that size_of_val_raw
is unsafe
, but using metadata().size_of()
one can implement the same thing entirely in safe code.
Has the lang team formally decided what validity invariants *const dyn Trait
has? (Or more generally, pointer metadata) I feel that should be the starting point.
Currently, Pointee::Metadata
is bound by the following traits: Copy
, Send
, Sync
, Ord
, Hash
, Unpin
. While all of them are required, they are not actually sufficient for being Metadata
. The compiler must also be able to know the size (and align) of the pointed value to generate code for std::mem::size_of_val
(align_of_val
).
Shouldn't Metadata
be bound by a trait that would provide such functions? If we ever want to support DSTs with custom metadata, this seems required
pub trait Pointee {
#[lang = "metadata_type"]
type Metadata: Copy + Send + Sync + Ord + Hash + Unpin + const PointerMetadata<Self>;
// P.S. I'm not sure if `const Trait` bounds are currently possible,
// we may need to wait until they are implemented, before stabilizing this feature.
}
pub unsafe trait PointerMetadata<Target> {
fn size_of_val(val: &Target) -> usize;
fn align_of_val(val: &Target) -> usize;
}
Example implementations
// Naming is bikeshadable
#[non_exhaustive]
pub struct SizedMetadata;
// Compiler can probably have a fast-path for sized types/slices/strs/trait objects to lower costs
unsafe impl<T> const PointerMetadata<T> for SizedMetadata {
fn size_of_val(: &T) -> usize { mem::size_of::<T>() }
fn align_of_val(_: &T) -> usize { mem::align_of::<T>() }
}
pub struct SliceLen(usize);
unsafe impl<T> const PointerMetadata<[T]> for SliceLen {
fn size_of_val(val: &[T]) -> usize {
let Self(len) = metadata(val);
len * mem::size_of::<T>()
}
fn align_of_val(_: &[T]) -> usize { mem::align_of::<T>() }
}
pub struct StrLen(usize);
unsafe impl const PointerMetadata<str> for StrLen {
fn size_of_val(val: &str) -> usize {
let this = metadata(val);
this.0
}
fn align_of_val(_: &str) -> usize { 1 }
}
unsafe impl<T: ?Sized> const PointerMetadata<T> for DynMetadata<T> {
fn size_of_val(val: &T) -> usize {
let this = metadata(val);
this.size_of()
}
fn align_of_val(val: &T) -> usize {
let this = metadata(val);
this.size_of()
}
}
// Theoretical
#[non_exhaustive]
pub struct ThinByteSliceMetadata;
// ThinByteSlice is a custom DST type (usize, [u8])
unsafe impl const PointerMetadata<ThinByteSlice> for ThinByteSliceMetadata {
fn size_of_val(val: &ThinByteSlice) -> usize {
// Safety: `ThinByteSlice` is guara;teed to have `len: usize` as it's first field
unsafe { *val.to_raw_parts().0.cast::<usize>() }
}
fn align_of_val(val: &ThinByteSlice) -> usize {
mem::align_of::<usize>()
}
}
Note that size_of_val_raw
and align_of_val_raw
currently prohibit calling themselves with a pointee type that doesn't have a slice, trait object or extern type as its last field, so it's ok to require a reference in PointerMetadata
methods.
While such trait allows being implemented for usize
/()
/etc instead of new types I would strongly argue against this, as it makes extensibility harder and adds weird methods like usize::align_of_val
.
An extension that could make `align_of_val_raw` safe(r?)
// Implemented for SliceLen<_>, StrLen<_>, DynMetadata<_>, SizedMetadata<_>
pub unsafe trait PointerMetadataThatDoesNotNeedReference<Target>: PointerMetadata<Target> {
fn size_of_val_raw(val: *const Target) -> usize;
fn align_of_val_raw(val: *const Target) -> usize;
}
// std::mem
pub /* unsafe ? */ const fn align_of_val_raw<T>(val: *const T) -> usize
where
T: ?Sized
<T as Pointee>::Metadata: ~const PointerMetadataThatDoesNotNeedReference<T>
{
<T as Pointee>::Metadata::align_of_val_raw(val)
}
// same for size
The only downside of this proposal that I see is that the compiler would be forced to generate PointerMetadata
impls like this one:
struct S<T: ?Sized> {
a: A,
tail: T,
}
// A lot of compiler magic required to actually support this
unsafe impl<T: ?Sized> PointerMetadata<S<T>> for T::Metadata {
fn size_of_val(val: &Target) -> usize {
mem::size_of::<A>() + T::Metadata::size_of_val(val.tail)
}
fn align_of_val(val: &Target) -> usize {
cmp::max(mem::align_of::<A>(), T::Metadata::align_of_val(val.tail))
}
}
Or otherwise, we lose the guarantee that metadata of (..., T)
has the same type as metadata of T
.
I think there should be a way around this, but I can't quite see how we can make this part better...
Actually, now that I've thought about size_of_val
/align_of_val
functions a little bit more, I may have an idea how to remove the downside from #81513 (comment).
The solution is just to inverse dependencies. Instead of SliceLen<_>
implementing PointerMetadata
, size_of_val
/align_of_val
should be implemented by the type itself. This would allow the compiler-generated impl to be a lot less magical and even allowed by the current coherent rules/etc.
pub trait Pointee {
#[lang = "metadata_type"]
type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
#[rustc_only_trait_resolvable]
const fn size_of_self(&self) -> usize;
#[rustc_only_trait_resolvable]
const fn align_of_self(&self) -> usize;
// P.S. `const fn` in traits aren't currently supported, so this isn't implementable yet
}
(#[rustc_only_trait_resolvable]
is a theoretical annotation similar to #[rustc_skip_array_during_method_dispatch]
that disables x.size_of_self()
and T::size_of_self(x)
to be resolved, while only allowing Pointee::size_of_self(x)
or <T as Pointee>::size_of_self(x)
)
This design actually seems a lot cleaner and simpler than the one I've previously proposed while still allowing for future extension with custom DSTs.
they are not actually sufficient for being
Metadata
The compiler has no need to be able to work with arbitrary impl Pointee for $Something {…}
custom definitions. In Rust 1.56 such an impl
is always disallowed because the compiler already generates impls of Pointee
for all types.
size_of_val
is not limited to methods available through the Pointee
trait (in the way generic library code would be) since it is a compiler intrinsic that can have special cases for all "kinds" of types supported by that compiler: arrays, trait objects, etc.
If the language is to ever gains support for custom DSTs (I’m not sure this is a necessity), then the RFC proposing to add them will need to define a mechanism for how size_of_val
should work for the new kind(s) of types. Maybe that would involve new methods in the Pointee
trait, maybe not.
I believe that stabilizing the Pointee
trait as-is does not prevent adding those methods later if needed for custom DSTs, because no custom impl
of Pointee
is allowed today.
Experience report, and the lack of CoerceUnsized
.
I ported this afternoon the storage-poc
repository from the rfc2580
crate to the core::ptr::Pointee
, and it was a painless experience.
The resulting code is cleaner, thanks to the integration of from_raw_parts
and to_raw_parts
with *const T
and NonNull<T>
, and works just as well.
The one slight disappointment I have is that the CoerceUnsized
situation is unfortunately not solved. That is, if we look at the RawBox
type, where S::Handle<T>
would be NonNull<T>
for a regular allocator, and is (T::Metadata)
in the test case at line #159:
pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
storage: ManuallyDrop<S>,
handle: S::Handle<T>,
}
impl<T, U, S> CoerceUnsized<RawBox<U, S>> for RawBox<T, S>
where
T: ?Sized + Pointee,
U: ?Sized + Pointee,
S: SingleElementStorage,
S::Handle<T>: CoerceUnsized<S::Handle<U>>,
{
}
#[test]
fn slice_storage() {
let storage = SingleElement::<[u8; 4]>::new();
let mut boxed: RawBox<[u8], _> = RawBox::new([1u8, 2, 3], storage).unwrap().coerce();
assert_eq!([1u8, 2, 3], &*boxed);
boxed[2] = 4;
assert_eq!([1u8, 2, 4], &*boxed);
}
We would hope that the call to coerce()
is unnecessary, as we would expect that the [u8; 3]::Metadata
could be coerced into [u8]::Metadata
, however it is not the case.
It is not clear, to me, if this the issue comes from [u8; 3]::Metadata == ()
, but the plain fact is that we have a Box that is not automatically coerced which is a slight blow to usability.
Sorry, I don’t quite follow as there seems to be a lot of context involved as to what traits exist in storage-poc
. However isn’t this an issue with CoerceUnsized
unrelated to pointer metadata?
By the way explicit T: Pointee
bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee
for any T
, so the associated type can be used without a bound. For example:
rust/library/core/src/ptr/metadata.rs
Line 93 in 84f962a
Sorry, I don’t quite follow as there seems to be a lot of context involved as to what traits exist in
storage-poc
. However isn’t this an issue withCoerceUnsized
unrelated to pointer metadata?
It's not clear to me where the issue lies, since it comes up with the interaction of the two features. Here is a reduced example on the playground:
#![feature(coerce_unsized)]
#![feature(ptr_metadata)]
use core::{ops::CoerceUnsized, ptr::Pointee};
struct Handle<T: ?Sized>(<T as Pointee>::Metadata);
impl<T, U> CoerceUnsized<Handle<U>> for Handle<T>
where
T: ?Sized,
U: ?Sized,
<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>,
{
}
fn main() {
//let _: Box<[u8]> = Box::new([1u8, 2, 3]);
let _: Handle<[u8]> = Handle::<[u8; 3]>(());
}
The Box line compiles, the Handle line doesn't.
If we dig further, we see that Metadata
doesn't properly implement CoerceUnsized
:
#![feature(coerce_unsized)]
#![feature(ptr_metadata)]
use core::{ops::CoerceUnsized, ptr::Pointee};
struct Foo<T: CoerceUnsized<<[u8] as Pointee>::Metadata> >(T);
fn foo(_: Foo<<[u8; 3] as Pointee>::Metadata>) {}
Fails with:
error[E0277]: the trait bound `(): CoerceUnsized<usize>` is not satisfied
--> src/main.rs:18:11
|
18 | fn foo(_: Foo<<[u8; 3] as Pointee>::Metadata>) {}
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `CoerceUnsized<usize>` is not implemented for `()`
|
note: required by a bound in `Foo`
--> src/main.rs:16:15
|
16 | struct Foo<T: CoerceUnsized<<[u8] as Pointee>::Metadata> >(T);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Foo`
And given that <[u8; 3] as Pointee::Metadata>
is just ()
, I don't see how it could, actually, implement CoerceUnsized
properly, so I am tempted to think that the problem is here.
By the way explicit T: Pointee bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee for any T, so the associated type can be used without a bound.
Ah nice! Thanks for the hint.
A CoerceUnsized
bound on the metadata does not really make any sense. It's not the metadata that gets unsiezd, after all. The metadata is produced during unsizing. E.g. when unsizing [u32; 3]
to [u32]
the metadata 3
is produced. But nowhere are we "unsizing" the original metadata type ()
to the new metadata type usize
.
This feels like an XY-problem to me, so maybe try explaining what it is you want to achieve with the metadata APIs here (instead of how you want to achieve that).
This feels like an XY-problem to me, so maybe try explaining what it is you want to achieve with the metadata APIs here (instead of how you want to achieve that).
It may well be!
I pointed to the storage-poc
crate to provide the background; the idea of the crate is that instead of storing a pointer (NonNull<T>
), one stores a handle (Handle<T>
) which may or may not be a pointer under the hood.
In the case of the RawBox<T, S>
example above, it uses inline storage (an embedded array of bytes) in which the handle "points". The handle is not a pointer, though, as the box can be moved around, and since a box only ever stores a single element, the handle doesn't need any index or anything. It just needs the pointee metadata.
Therefore, in the example above, the handle is Handle<T>(<T as Pointee>::Metadata)
, and the box is RawBox<T, S>(S, S::Handle<T>)
.
The goal is for RawBow
to be coercible. At the moment, the rules for implementing CoerceUnsized
requires that the last field be CoerceUnsized
and do not allow any computation -- it just magically happens -- and therefore I conclude that <T as Pointee>::Metadata
must be CoerceUnsized
, as this is necessary for Handle<T>
to be CoerceUnsized
, which in turn is necessary for RawBox<T, S>
to be CoerceUnsized
.
I may very well be misunderstanding the rules, though, and of course there is the option of allowing custom logic in CoerceUnsized
although this seems overkill to me.
My first impression is that the Rust unsizing system is simply not up to the task you are asking for here. Rust unsizing only works on pointers, and this is hard-coded pretty deeply in the compiler. You would need that system to be made more flexible so that one can talk about just the metadata generation part of the unsizing coercion without the part where that metadata is used to create a new wide pointer. Presumably such an extension of the unsizing system should suitably interact with the metadata APIs tracked in this issue, but that extension goes way beyond the scope of the metadata APIs.
At the moment, the rules for implementing CoerceUnsized requires that the last field be CoerceUnsized
This is because that last field is the one that becomes unsized during the coercion (e.g. when RcBox<[u32; 3]>
is coerced to RcBox<[u32]>
). That's not what happens in your case so going further down this track won't lead anywhere.
Presumably such an extension of the unsizing system should suitably interact with the metadata APIs tracked in this issue, but that extension goes way beyond the scope of the metadata APIs.
Thanks, that's very helpful. This means I'm not doing anything wrong and it's just not supported today (at all).
The next question, then, is: Does the current implementation, using ()
as metadata for Sized
types, allow such an extension of the unsizing system?
Or, more prosaically: Are we comfortable stabilizing ()
as metadata for Sized
types, knowing that it likely closes the door to ever implementing CoerceUnsized
for it?
(The same question does not immediately apply to usize
as metadata for slice types, as those cannot be unsized further)
Simon had expressed concerns about a strongly-typed metadata piece for Sized
types due to the amount of types this could explode into, however I note that the current implementation seems to use a strong-typed metadata piece for trait types already, so it's not clear if this is still a concern.
Or, more prosaically: Are we comfortable stabilizing () as metadata for Sized types, knowing that it likely closes the door to ever implementing CoerceUnsized for it?
I don't think we would ever want to implement CoerceUnsized
for the metadata of Sized types. It's not the metadata that is being unsized, after all! So IMO that would just make no sense.
However I should also add that I do not consider myself an unsizing expert. I have no idea what it would take to support types like yours.
Metadata unsizing could be done today:
#![feature(unsize)]
#![feature(ptr_metadata)]
use core::marker::Unsize;
use core::ptr::Pointee;
fn unsize_metadata<T: ?Sized, U: ?Sized>(t: <T as Pointee>::Metadata) -> <U as Pointee>::Metadata
where
T: Unsize<U>,
{
(core::ptr::from_raw_parts::<T>(core::ptr::null(), t) as *const U)
.to_raw_parts()
.1
}
fn main() {
let len = unsize_metadata::<[u8; 3], [u8]>(());
println!("{}", len);
}
So we could have a new-type wrapping around metadata to create a "strongly typed" metadata:
#![feature(unsize)]
#![feature(ptr_metadata)]
use core::marker::Unsize;
use core::ptr::Pointee;
struct TypedMetadata<T: ?Sized>(pub <T as Pointee>::Metadata);
impl<T> TypedMetadata<T> {
fn of() -> Self {
TypedMetadata(core::ptr::null::<T>().to_raw_parts().1)
}
}
impl<T: ?Sized> TypedMetadata<T> {
fn unsize<U: ?Sized + Unsize<U>>(self) -> TypedMetadata<U> {
TypedMetadata(
(core::ptr::from_raw_parts::<T>(core::ptr::null(), self.0) as *const U)
.to_raw_parts()
.1,
)
}
}
// This couldn't be done today, but we could support this in compiler similar to pointers.
// impl<T: ?Sized + Unsize<U>, U: ?Sized> CoerceUnsized<TypedMetadata<U>> for TypedMetadata<T> {}
fn main() {
let len = TypedMetadata::<[u8; 3]>::of().unsize::<[u8]>().0;
println!("{}", len);
}
This is not metadata that is being "unsized". Unsizing means going from (a pointer/reference to) a statically-sized type to (a pointer/reference to) a dynamically-sized type.
In your example, *const [u8; 3]
is unsized to *const [u8]
, where [u8]
is dynamically-sized. Your unsize_metadata
function then extracts the metadata of that unsized pointer. It also relies on ptr::null()
which has an implicit T: Sized
bound.
Implementing the CoerceUnsized
trait for metadata types (even if more-strongly-typed) does not make sense. Maybe this "return pointer metadata after unsizing null()
" operation can be useful but it’s a different operation from "unsize this pointer or reference" and shouldn’t be shoehorned into CoerceUnsized
.
TypedMetadata
could be understood as a ZST pointer to null.
That’s the kind of thing I mean by "shoehorning". Maybe not impossible but I don’t think it’s a good idea because that’s just not what CoerceUnsized
means / is for.
Implementing the
CoerceUnsized
trait for metadata types (even if more-strongly-typed) does not make sense. Maybe this "return pointer metadata after unsizingnull()
" operation can be useful but it’s a different operation from "unsize this pointer or reference" and shouldn’t be shoehorned intoCoerceUnsized
.
This is fair, but leaves us with the problem unsolved.
A great benefit of the ability to split/join a pointer and its associated metadata is the creation of custom handles which contain "something" (not necessarily a pointer) and the associated metadata, and today such custom handles cannot implement CoerceUnsized
making them less ergonomic than the language-supported pointers they emulate.
Implementing CoerceUnsized
for metadata would solve the issue quite naturally and offer the side benefit that no user-written (with arbitrarily complex logic) would run during coercion.
Other possibilities include having a specific lang-item typed metadata for this situation, which itself implements CoerceUnsized
, perhaps a Null<T>
(as a parallel with NonNull<T>
), which I guess would be a different proposal and have your preference?
I'm writing a new crate with a bunch of unsafe
code that needs to avoid core::mem::swap()
so I had to opt into creating my own struct RefMut<'a>
(the real signature is a bit more complicated but that's not important here). I'd prefer this to be custom DST to not need a reborrow()
method and other things.
However my metadata happens to be &'same_lifetime_as_t mut Meta
. It'd be much nicer to have this as a true reference, not a pointer so that the compiler can take advantage of optimizations and maybe it could provide additional checking. The Copy
bound prevents this though. Thinking about this, I have some ideas on how to solve this without causing problems around lack of Copy
:
pub trait Pointee {
type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
type MetadataMut: CopyOrReborrow + Send + Sync + Ord + Hash + Unpin + Into<Self::Metadata>;
}
pub trait CopyOrReborrow {
type Output<'a>; // maybe some funny lifetime bounds
fn copy_or_reborrow(&mut self) -> Self::Output;
}
impl<T: Copy> CopyOrReborrow for T {
type Output<'a> = T;
fn copy_or_reborrow(&mut self) -> Self::Output {
*self
}
}
impl<T> CopyOrReborrow for &'_ mut T {
type Output<'a> = &'a mut T;
fn copy_or_reborrow(&mut self) -> Self::Output {
&mut *self
}
}
fn metadata<T: Pointee>(value: &T) -> T::Metadata { ... }
fn metadata_mut<T: Pointee>(value: &mut T) -> T::MetadataMut { ... }
I believe that stabilizing the
Pointee
trait as-is does not prevent adding those methods later if needed for custom DSTs, because no custom impl ofPointee
is allowed today.
By the way explicit T: Pointee bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee for any T, so the associated type can be used without a bound.
I see this as a problem because such would imply being able to do mem::swap()
of DSTs which would destroy my use case.
It seems that <T as Pointee>::Metadata
is always invariant in T
-- is that expected?
For example, these two constructs seem like they should be equivalent, but the latter fails: playground
#![feature(ptr_metadata)]
use std::ptr::{NonNull, Pointee};
pub struct Pointer<T: ?Sized>(NonNull<T>);
pub fn covariant_pointer<'a>(pointer: Pointer<&'static str>) -> Pointer<&'a str> {
pointer
}
pub struct Parts<T: ?Sized>(NonNull<()>, <T as Pointee>::Metadata);
pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
parts
}
error[E0308]: mismatched types
--> src/lib.rs:14:5
|
14 | parts
| ^^^^^ lifetime mismatch
|
= note: expected struct `Parts<&'a str>`
found struct `Parts<&'static str>`
note: the lifetime `'a` as defined here...
--> src/lib.rs:13:24
|
13 | pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
| ^^
= note: ...does not necessarily outlive the static lifetime
The Metadata
is just ()
here, since &str
is sized, but it also fails if you make that a slice or trait object with a lifetime inside. I guess it is expected that trait objects are invariant, but I think with sized and slice versions of T
it should be covariant.
Invariance is definitely expected (all projection is invariant), but also problematic here.
Probably OT, but isn't this the same issue that's preventing TyKind from being refactored into its own crate for chalk integration?
There seems to be a problem with the combination of the ptr_metadata
and trait_upcasting
features.
I'm not sure if posting this over at #65991 might have been better, please let me know.
I'll use ThinBox as a motivating example but it applies to other custom "thin" implementations of datastructures.
The problem is that you can't implement
impl<T: ?Sized + Unsize<U>, U: ?Sized> CoerceUnsized<ThinBox<U>> for ThinBox<T> {} // error[E0277]: the trait bound `WithHeader<<T as Pointee>::Metadata>: CoerceUnsized<WithHeader<<U as Pointee>::Metadata>>` is not satisfied
which then leads to:
#![feature(trait_upcasting)]
trait Foo {}
trait Bar: Foo {}
impl Foo for i32 {}
impl Bar for i32 {}
let bar: Box<dyn Bar> = Box::new(123);
let foo: Box<dyn Foo> = bar; // works fine
let bar: ThinBox<dyn Bar> = ThinBox::new_unsize(123);
let foo: ThinBox<dyn Foo> = bar; // error[E0308]: mismatched types
This might be similar to what @matthieu-m mentioned here in relation to storage-poc
I don't know enough about these features to suggest a solution, but it would seem highly desirable to not prematurely close the door on such use-cases.
The problem is that you can't implement […]
CoerceUnsized<ThinBox<U>> for ThinBox<T>
That sounds expected to me.
CoerceUnsized
and the implicit coercion it enables are all about having the compiler automatically creating a wide pointer-like value from the corresponding thin point-like value and target !Sized
type. ThinBox<U>
however is still thin, so the compiler can’t know where to put the new metadata. "Unsizing" a ThinBox
necessarily has to be a library API.
It’s even worse than that for ThinBox
: if T
and U
have differently-sized metadata, then ThinBox<T>
and ThinBox<U>
allocate heap memory with different layouts so converting between them requires a new allocation. So users of want a ThinBox<U>
would likely prefer some other API that does the "usizing" while creating their first ThinBox
, without going through an intermediate ThinBox<T>
.
This lesser flexibility related to allocation layout is the tradeoff to make in exchange for thin pointers to DSTs.
If I understand correctly, you're saying that CoerceUnsized
only does "thin" to "wide".
That then rules out any "thin" to "thin" conversions.
Is it then fair to say that trait upcasting is misusing CoerceUnsized
because it does a "wide" to "wide"
cast?
I didn’t know that trait upcasting also involved CoerceUnsized
. The important part of #81513 (comment) is that when the compiler implicitly generates new pointer metadata it puts it in a wide pointer, the target of the conversion. Although in the upcasting case, it probably also needs to know where to read the previous metadata. So if there is metadata in the source pointer, it needs to be in a wide pointer too.
I think I understand. There is currently only one place where the compiler knows how to find the vtable and that is inside a wide pointer. If it's somewhere else, no upcasting coercion for you.
Maybe we could have something like this in the future though to make manual implementations less error prone:
impl<Dyn: ?Sized> DynMetadata<Dyn> {
pub fn upcast<Dyn2: ?Sized>(self) -> DynMetadata<Dyn2>
where
Dyn: Unsize<Dyn2>,
{}
}
Calling a method on the metadata type means you already have access to the metadata, which doesn’t help with the compiler not knowing where to find it in ThinBox
.
Instead I would expected something like:
impl<T> ThinBox<T> {
fn try_upcast(self) -> Result<ThinBox<U>, Self> where /* something */ {…}
}
to be provided by the thin box library. The library implementation of this might create a temporary raw pointer with the current metadata, upcast it with as
, then extract the metadata of that new raw pointer.
Anything more integrated into the language gets into Custom DST territory and opens a lot more design question IMO.
As mentiond in the PR linked above - formatted debug prints of pointers based on what kind of pointer they are - I am wondering wheter there should be a distinct trait for Pointee::Metadata
, rather than a list of constraints in the Pointee
trait itself. (Named something like PointerMetedata
or so.)
Arguably it's not much different re the Pointee
interface in practice, but would enable to write fns/impls for T: PointerMetedata
in the future...
I'm wondering whether the pointer could be changed from const *()
to a new const *Opaque<T>
, so users could have a little help from the compiler in keeping straight which "artificially thin" pointers are to which types.
Maybe not worth the added complexity, but it would feel nicer not to lose all type information on to_raw_parts()
.
What would Opaque
be? Could you spell out some more the specific definitions and signatures you have in mind?
I'm not in a position to really write this up properly right now, but I think I may have a good case for adding a new type of pointer with additional, entirely dynamic metadata that couldn't be uniquely associated with the pointee type.
This wouldn't affect Pointee::Metadata
but it would maybe mean we should consider pointee_metadata
instead of metadata
. Hopefully I will be able to write up the full idea someplace.
I was thinking of something like
struct Opaque<T>(PhantomData<T>);
Hopefully with a nicer name than Opaque.
I think <[T] as Pointee>::Metadata == usize
is a forward compatibility hazard: if we ever to allow [T]
to be used with T: !Sized
element types, it forces us into either:
- giving up and staying with the current slice types ~forever
- using some
(unsound?)specialization to branch on whetherT: Sized
and useSliceMetadata<T>
instead ofusize
in theT: !Sized
case
It seems preferable to introduce SliceMetadata<T>
today, before Pointee
can be stabilized.
(And to avoid any unforeseen typesystem interactions, it should include a <T as Pointee>::Metadata
even if T: Sized
would hold for now)
EDIT: @BoxyUwU has pointed out to me that while specializing on T: Sized
would probably not run into soundness issues (assuming custom DSTs don't accidentally require fragile bounds or something), but there's a worse problem:
This compiles today, but if Pointee
impls start using specialization, that will block normalization and break it:
fn foo<T>(x: usize) -> <[T] as Pointee>::Metadata {
x
}
(it's possible for us to eventually allow normalization of associated types that involve specialization, in some cases, but it's again a bunch of additional complexity to work around the hardcoded usize
)
@eddyb what would the metadata for a [[T]]
be, do you think? I do wonder whether it makes sense to use slice types there or to encourage people to build their own types (it seems to me like there is no "obviously correct" semantics to assign)
(it seems to me like there is no "obviously correct" semantics to assign)
Even if that is the case, a lot of decisions we make now could lock us into a future where all existing types are limited and users will end up having to use custom DSTs for everything else - this does not seem optimal to me.
I think it's even worse than just [T]
with unsized T
: tuples and struct
s do not wrap their inner metadata, so if we ever want to e.g. support (T, U)
with unsized T
and U
, we'd have:
- when
T: Sized
:<(T, U)>::Metadata == U::Metadata
- when
T: !Sized
:<(T, U)>::Metadata == (T::Metadata, U::Metadata)
This is, again, a "type(class) match" on T: Sized
, which I would think we'd really want to avoid.
Maybe exposing actually user-visible types was a mistake altogether and we should wrap them in a type invariant on the pointee type, with no user-visible normalization.
That is, we could hide metadata types in something like this (with perma-unstable RustcProvided*
):
struct MetadataOf<T: ?Sized>(<T as RustcProvidedPointee>::RustcProvidedMetadata);
impl<T: ?Sized> Pointee for T {
type Metadata = MetadataOf<T>; // maybe not even an assoc type at that point.
}
It's not perfect but at least the Metadata
is bound by enough auto traits for nothing other than the size to "leak out" of the definition of MetadataOf
, I don't think.
Another problem I've come across is "dynamic alignment". Consider something like this:
struct WithPrefixes<T: ?Sized> {
_prefix16: u16
_prefix8: u8,
tail: T,
}
Today, we have two kinds of DSTs wrt alignment (static vs dynamic), which results in:
WithPrefixes<[T]>
has a statictail
offset ofround_up(3, align_of::<T>())
WithPrefixes<dyn Trait>
has a dynamically computedtail
offsetround_up(3, max(2, align_of_val(&self.tail)))
withalign_of_val
reading fromdyn Trait
's vtable- the metadata is (today)
DynMetadata<dyn Trait>
, i.e. just thedyn Trait
vtable
We could remove the runtime max
and round_up
by instead having this setup:
// Assuming this is `DynMetadata` today and `DynVtable` exists:
struct DynMetadata<T: ?Sized>(&'static DynVtable<T>);
struct WithPrefixesDynVtable<T: ?Sized> {
tail_vtable: DynVtable<T>,
tail_offset: usize,
}
struct WithPrefixesDynMetadata<T: ?Sized>(&'static WithPrefixesDynVtable<T>);
Then getting the offset of the tail
field is as cheap as align_of_val(&self.tail)
today.
However, note that WithPrefixes<T>
now can have different metadata from T
, which is another change from the current Pointee
setup (and would also benefit from having the real Metadata
type hidden).
Also, long-term we probably need something like this to support Option<dyn Trait>
, if we want to ever attempt it - at least, it makes far more sense to have precomputed field offsets (and a tag decoder fn
pointer, pretty much exactly mem::discriminant as fn(_) -> _
) in an "extended vtable", than try to generate some code that uses only dyn Trait
vtable, somehow.
- using some
(unsound?)specialization to branch on whetherT: Sized
Today’s impls of the Pointee
trait are made up through compiler magic, so this kind of branch/special case would be possible without #![feature(specialization)]
- using some
(unsound?)specialization to branch on whetherT: Sized
Today’s impls of the
Pointee
trait are made up through compiler magic, so this kind of branch/special case would be possible without#![feature(specialization)]
That's worse - the "compiler magic" has none of the checks or analyses in place, that we've at least tried to add to specialization. And unless we make that "compiler magic" work more like specialization before stabilization, Pointee
will be able to do things on stable that specialization is disallowed even on nightly!
That is, I'm not aware of a way to write a T: Sized
"type-level branch" that resolves during user-facing type-checking, as associated type specialization is intentionally firewalled from it (and I would expect we'd held CTFE to the same standard, but I haven't done a thorough investigation of e.g. const fn
s in traits).
(Meanwhile, the way raw pointer casts work seem to imply the equality of metadata, just like the Pointee
docs, despite that not really being something we should've ever promised in any way)
We've done a much better job with std::mem::Discriminant
, of hiding the "compiler magic" in a newtype, and we should be doing the same thing here IMO.
Anyway, I can't really block any of this and I'm guessing people are more interested in custom DSTs than removing artificial limitations that only really exist because we barely managed to get the built-in DSTs we have today.
The tragic irony being, of course, that most of the implementation work on custom DSTs would be passing arbitrary amounts of metadata around. The same blocker as for a lot of the built-in stuff we never got to do.
I think
<[T] as Pointee>::Metadata == usize
is a forward compatibility hazard: if we ever to allow[T]
to be used withT: !Sized
element types, it forces us into either:* giving up and staying with the current slice types ~forever * using some ~(unsound?)~ specialization to branch on whether `T: Sized` and use `SliceMetadata<T>` instead of `usize` in the `T: !Sized` case
It seems preferable to introduce
SliceMetadata<T>
today, beforePointee
can be stabilized. (And to avoid any unforeseen typesystem interactions, it should include a<T as Pointee>::Metadata
even ifT: Sized
would hold for now)
Using a SliceMetadata
wrapper wouldn't solve anything I think. [[T]]
would need unsized metadata, right? Unsized metadata is impossible as std::ptr::metadata
has to work for any *const T where T: ?Sized
. A function can't return an unsized value.
Excuse my ignorance, but how would [[T]]
even work?
I think [U]
where the metadata of &U
is allowed to vary between elements would be a ludicrous thing,[1]. because it would require &[U]
to store a metadata for each element, which would make the pointer itself unsized. But there are some imaginable variations:
[U]
whereU
is unsized could be syntactic sugar for[(<U as Pointee>::Metadata, U)]
, sort of a Pascal-style array. In this case, no change to the metadata of&[U]
would be needed. So while there's definitely some potential use cases for Pascal-style strings, say, there's no need for any further conversation here I think.[U]
could require that all elements have the same metadata. This would make&[[T]]
into a typical multi-dimensional array, definitely a case that sees use in practice. The expectation would presumably be that the outer reference would carry the (singular) metadata to avoid needing multiple copies of it. I could see some special syntax to remind users that the inner size must be the same, e.g.&[[T]; dyn]
but that's neither here nor there.
Honestly, I think the future-compatibility concerns are worth heeding here. Stabilising that size==stride has caused me a significant amount of headache, and completely cut us off from ABI interoperability with languages where that isn't true (h/t @Gankra for bringing other examples to light), so I'm all for conservatism here.
Given the options, I'd be inclined to just make SliceMetadata
opaque for now like DynMetadata
is, and throw a len()
method onto it to get the actual length. I don't think there's any reason to consider unification with usize
a feature, more of a design accident.
[1]: U
being implicitly unsized since if it is sized, its metadata is ()
and does not vary.