Tracking issue for RFC 1861: Extern types
aturon opened this issue Β· 291 comments
This is a tracking issue for RFC 1861 "Extern types".
Steps:
- Implement the RFC (#44295)
- Adjust documentation (see instructions on forge)
- Stabilization PR (see instructions on forge)
Unresolved questions:
-
Rust does not support types that don't have dynamically computed alignment -- we need the alignment to compute the field offset in structs.
extern type
violates this basic assumption, causing pain, suffering, and ICEs all over the compiler. What is the principled fix for this? -
Should we allow generic lifetime and type parameters on extern types?
If so, how do they effect the type in terms of variance? -
In std's source, it is mentioned that LLVM expects
i8*
for C'svoid*
.
We'd need to continue to hack this for the twoc_void
s in std and libc.
But perhaps this should be done across-the-board for all extern types?
Somebody should check what Clang does. Also see #59095.
RESOLVED because all pointer types areptr
now. -
How should this interact with unsized arguments? Currently it ICEs: #115709
This is not explicitly mentioned in the RFC, but I'm assuming different instances of extern type
are actually different types? Meaning this would be illegal:
extern {
type A;
type B;
}
fn convert_ref(r: &A) -> &B { r }
Relatedly, is deciding whether we want to call it extern type
or extern struct
something that can still be done as part of the stabilization process, or is the extern type
syntax effectively final as of having accepted the RFC?
EDIT: rust-lang/rfcs#2071 is also relevant here w.r.t. the connotations of type
"aliases". In stable Rust a type
declaration is "effect-free" and just a transparent alias for some existing type. Both extern type
and type Foo = impl Bar
would change this by making it implicitly generate a new module-scoped existential type or type constructor (nominal type) for it to refer to.
Can we get a bullet for the panic vs DynSized debate?
I've started working on this, and I have a working simple initial version (no generics, no DynSized).
I've however noticed a slight usability issue. In FFI code, it's frequent for raw pointers to be initialized to null using std::ptr::null/null_mut
. However, the function only accepts sized type arguments, since it would not be able to pick a metadata for the fat pointer.
Despite being unsized, extern types are used through thin pointers, so it should be possible to use std::ptr::null
.
It is still possible to cast an integer to an extern type pointer, but this is not as nice as just using the function designed for this. Also this can never be done in a generic context.
extern {
type foo;
}
fn null_foo() -> *const foo {
0usize as *const foo
}
Really we'd want is a new trait to distinguish types which use thin pointers. It would be implemented automatically for all sized types and extern types. Then the cast above would succeed whenever the type is bounded by this trait. Eg, the function std::ptr::null
becomes :
fn null<T: ?Sized + Thin>() -> *const T {
0usize as *const T
}
However there's a risk of more and more such traits creeping up, such as DynSized
, making it confusing for users. There's also some overlap with the various custom RFCs proposals which allow arbitrary metadata. For instance, instead of Thin
, Referent<Meta=()>
could be used
I think we can add extern types now and live with str::ptr::null
not supporting them for a while until we figure out what to do about Thin
/DynSized
/Referent<Meta=β¦>
etc.
@SimonSapin yeah, it's definitely a minor concern for now.
I do think this problem of not having a trait bound to express "this type may be unsized be must have a thin pointer" might crop up in other places though.
Oh yeah, I agree we should solve that eventually too. Iβm only saying we might not need to solve all of it before we ship any of it.
Auto traits are not implemented by default, since the contents of extern types is unknown. This means extern types are
!Sync
,!Send
and!Freeze
. This seems like the correct behaviour to me. Manualunsafe impl Sync for Foo
is still possible.
While it is possible for Sync, Send, UnwindSafe and RefUnwindSafe, doing impl Freeze for Foo
is not possible as it is a private trait in libcore. This means it is impossible to convince the compiler that an extern type
is cell-free.
Should Freeze
be made public (even if #[doc(hidden)]
)? cc @eddyb #41349.
Or is it possible to declare an extern type is safe-by-default, which opt-out instead of opt-in?
extern {
#[unsafe_impl_all_auto_traits_by_default]
type Foo;
}
impl !Send for Foo {}
@kennytm What's the usecase? The semantics of extern type
are more or less that of a hack being used before the RFC, which is struct Opaque(UnsafeCell<()>);
, so the lack of Freeze
fits.
That prevents rustc from telling LLVM anything different from what C signatures in clang result in.
@eddyb Use case: Trying to see if it's possible to make CStr a thin DST.
I don't see anything related to a cell in #44295? It is reported to LLVM as an i8
similar to str
. And the places where librustc_trans
involves the Freeze trait reads the real type, not the LLVM type, so LLVM treating all extern type as i8
should be irrelevant?
@kennytm So with extern type CStr;
, writes through &CStr
would be legal, and you don't want that?
The Freeze
trait is private because it's used to detect UnsafeCell
and not meant to be overriden.
The original intent was to match the extern type CStr
with the existing behavior of struct CStr([c_char])
which is Freeze. Eddyb and I discussed on IRC, which assures that (1) Freeze is mainly used to disable certain optimizations only and (2) as Freeze is a private trait, no one other than the compiler will rely on it. So the missing Freeze trait will be irrelevant for extern type CStr
.
Regarding the thin pointer issue, I imagine that const generics will eventually enable constant comparisons in where
clauses - if size_of
is made const also, that would let you write a bound of where size_of::<Ptr>() == size_of::<usize>()
which IMO matches the intent pretty perfectly.
This is the first I hear of allowing const expressions in where clauses. While it could be very useful, it seems far from given that this will be accepted into the language.
@SimonSapin Const expression in where
clause will eventually be needed for const generics beyond RFC 2000 (spelled as with
in rust-lang/rfcs#1932), but I do think this extension is out-of-scope for extern type or even custom DST in general.
I didn't mean to assume that that will be supported, I meant that if it did become possible in a reasonable timeframe, which apparently is not likely, I think it'd be nice to have more regular syntax to express some ideas rather than more marker traits with special meaning given by the compiler. If that's not going to work, then great, it's one less thing to consider.
One of my use cases for extern types is to represent opaque things from the macOS frameworks with them.
For that purpose, I actually want to be able to wrap opaque things in some generic types to encode certain invariants related to refcounting.
For example, I want a CFRef<T>
type that derefs to CFShared<T>
that itself derefs to T
.
pub struct CFRef<T>(*const CFShared<T>);
pub struct CFShared<T>(T);
This is apparently not possible if T
is an extern type.
pub extern type CFString;
fn do_stuff_with_shared_string(str: &CFShared<CFString>) { ... }
Would it be complicated to support such a thing?
@nox struct CFShared<T: ?Sized>(T);
?
Note that this may not compile after we have implemented the DynSized
trait since an extern type
is not DynSized
, and we canβt place a !DynSized
field inside a struct (may need explicit #[repr(transparent)]
to allow it.
Can somebody provide an update on the status of this and maybe summarize the remaining open issues? I'd like to know if it is possible already to implement c_void
in libc
/core
using extern type
, for example.
@gnzlbg as mentioned in the initial post in this issue:
In std's source, it is mentioned that LLVM expects
i8*
for C'svoid*
.
We'd need to continue to hack this for the twoc_void
s in std and libc.
But perhaps this should be done across-the-board for all extern types?
Somebody should check what Clang does.
There is no solution for this yet, I think.
@gnzlbg Whether we want DynSized
needs to be resolved as well. The description of the initial implementation does a great job of laying out the footguns that exist without it. Thanks @plietar!
C Opaque struct:
typedef struct c_void c_void;
c_void* malloc(unsigned long size);
void call_malloc() {
malloc(1);
}
clang version 5.0.1:
%struct.c_void = type opaque
define void @call_malloc() #0 {
%1 = call %struct.c_void* @malloc(i64 1)
ret void
}
declare %struct.c_void* @malloc(i64) #1
C void:
void* malloc(unsigned long size);
void call_malloc() {
malloc(1);
}
clang version 5.0.1:
define void @call_malloc() #0 {
%1 = call i8* @malloc(i64 1)
ret void
}
declare i8* @malloc(i64) #1
Rust extern type:
#![feature(extern_types)]
#![crate_type="lib"]
extern "C" {
type c_void;
fn malloc(n: usize) -> *mut c_void;
}
#[no_mangle]
pub fn call_malloc() {
unsafe { malloc(1); }
}
Rust nightly:
%"::c_void" = type {}
define void @call_malloc() unnamed_addr #0 !dbg !4 {
start:
%0 = call %"::c_void"* @malloc(i64 1), !dbg !8
br label %bb1, !dbg !8
bb1: ; preds = %start
ret void, !dbg !10
}
declare %"::c_void"* @malloc(i64) unnamed_addr #1
Great find! This %struct.c_void = type opaque
sure looks worth imitating. If that's slower than i8*
IMO that's a clang/LLVM bug to report.
Would this syntax be acceptable?
extern {
#[repr(i8)]
type c_void;
}
@jethrogb looks good to me, but I'd be tempted to keep it unstable as it only exists to hack around LLVM.
I'd be tempted to keep it unstable as it only exists to hack around LLVM.
That sounds good in principle, but I think there were plans to have the final public definition of c_void
live in a crates.io crate.
@jethrogb hehe just quoted those plans. I do see the tension, bummer.
Great find! This %struct.c_void = type opaque sure looks worth imitating. If that's slower than i8* IMO that's a clang/LLVM bug to report.
It is definitely slower than i8*
. For example, malloc declared as returning an opaque struct won't be recognized as malloc by most (all?) optimization passes. It's a known issue and according to some LLVM devs the right way to fix it is to get rid of pointee types altogether and have just one pointer type, but that doesn't seem to be happening any time soon so you'll have to emit i8*
.
@whitequark do you happen to have a link to the LLVM bug?
@gnzlbg I'm not sure if there's one, that was from IRC discussions.
I've tried to search for one without any luck so I've filled this one: https://bugs.llvm.org/show_bug.cgi?id=36795
What @whitequark pointed out looks correct, when using i8* LLVM can eliminate calls to malloc, while when using a type opaque c_void* it cannot.
In the RFC:
As a DST,
size_of
andalign_of
do not work, but we must also be careful thatsize_of_val
andalign_of_val
do not work either, as there is not necessarily a way at run-time to get the size of extern types either. For an initial implementation, those methods can just panic, but before this is stabilized there should be some trait bound or similar on them that prevents their use statically. The exact mechanism is more the domain of the custom DST RFC, RFC 1524, and so figuring that mechanism out will be delegated to it.
However RFC 1524 was closed. Its successor is probably rust-lang/rfcs#2255, but thatβs an issue rather than a PR for a new RFC.
Per #46108 (comment) the lang team recently decided against having a DynSized
trait. But that leaves an unresolved question in this open RFC.
In rustc 1.26.0-nightly (9c9424d 2018-03-27), this compiles without warning and prints 0
:
#![feature(extern_types)]
extern { type void; }
fn main() {
let x: *const void = main as *const _;
println!("{}", std::mem::size_of_val(unsafe { &*x }));
}
The libs team discussed defining a public void
extern type in the standard library and changing the return type of memory allocation APIs to *mut void
instead of *mut u8
. However in that case weβd need to decide what to do about size_of_val
+ extern types before allocations APIs are stabilized. (Keeping void
unstable wouldnβt help, if you can obtain a pointer to it you donβt need to name the type to call size_of_val
.)
CC @rust-lang/lang
rust-lang/rfcs#1524 (Custom DST) is orthogonal to rust-lang/rfcs#2255 (Whether we want more ?Trait
). The successor is https://internals.rust-lang.org/t/pre-erfc-lets-fix-dsts/6663.
In #46108 we decided against ?DynSized
, but I think a DynSized
without a ?
(e.g. rust-lang/rfcs#2310 or rust-lang/rfcs#2255 (comment)) is still on the table.
BTW for consistency with common C extensions, if the size_of void
cannot be undefined, it should be set to 1.
Conclusions from the lang meeting at the all-hands:
size_of_val
should panic if called on anextern type
- We should have a best-effort lint to statically detect if you call
size_of_val
on anextern type
, either directly or ideally also through a generic. - None of this impacts the ability to do custom DSTs.
Does anyone have a specific good reason this shouldn't panic, and should instead abort?
ideally also through a generic.
Would this result in a monomorphization time error?
@gnzlbg It sounded like folks were generally in favor of a monomorphization-time lint.
Would appreciate if someone could also elaborate on the reasoning behind these conclusions. :)
Rough summary: extern type
is a special-purpose feature that exists for FFI, so adding a pile of trait machinery to statically detect and reject calls to size_of_val
on one didn't seem worth it. We had a very strong consensus against returning a sentinel value, which left us with "either panic or abort". There was some discussion about whether we had any motivation to abort, but we couldn't think of any specific cases where panic would lead to breakage. Finally, we still do like the idea of statically detecting these issues, but we can do that with a lint for at least the most common cases.
@glaebhoerl Hey =) It's kind of hard to write up a detailed comment just now, but I want to say a few things. First off, like any weighty decision, I would describe this as a "preliminary conclusion", subject as always to revision if persuasive counterarguments arise. =)
As for reasoning, there are some minutes from discussion here but they're pretty brief. Here is my attempt at a summary of the key points as I remember them:
- The "desugaring" of
T: ?Sized
works is already fairly surprising to users as is.- Extending to a three-layer hierarchy makes it quite tongue twisting even for advanced users:
- You say
T
to meanT: Sized
- You say
T: ?Sized
to meanT: DynSized
- You say
T: ?DynSized
to meanT
- You say
- Extending to a three-layer hierarchy makes it quite tongue twisting even for advanced users:
- We would like to extend to custom DSTs in the future; this is often cited as being connected to
DynSized
, but that doesn't seem entirely complete. We can still have aDynSized
trait (or family of traits), but they don't have to be supplied by default:- If you write
T
, you getSized
so you're all set - If you write
T: ?Sized
, you get nothing, but have to add other bounds just like ordinary bounds- it does mean that
size_of_val
andalign_of_val
are always invokable for any type- but these are the most general case anyway (when you have the full value + its metadata); we're covering the hard case now.
- it does mean that
- If you write
- When we have custom DST, that implies that
size_of_val
and friends will run user code anyway. That code could panic.- Given that, we will have the possibility of
size_of_val
panicking.
- Given that, we will have the possibility of
- There was some mild concern that
size_of_val
might execute in unsafe code that is not panic safe, creating a footgun.- We could make it hard abort instead -- also if user code panics.
- But we wanted more persuasive arguments, e.g. examples of code in the wild that would have a problem (brief inspection of code in the standard library didn't turn up such problems, but I didn't really look especially hard).
Thanks!
(To be clear I'm skeptical about the value of ?DynSized
as well, at least on its own, when its only utility would be to prevent misuse of size_of_val
.)
I was mainly curious about the reasoning around the choice of "panic" versus "return 0". I don't think of 0
as being a sentinel value in this case, if "sentinel value" is understood as "something the caller has to check for specifically and handle specially". I agree that panicking is preferable to this.
I think of 0
as a "safest possible default value" -- that is, if someone asks for the size_of_value
of an extern type
, gets 0
, and proceeds to read and write 0 bytes to and from memory, the effect will be that of a no-op, which is unlikely to actually cause any problems. The question is what (if any) scenarios are there where it would. (I might have asked this same question on the extern type
RFC thread and someone might have even tried to answer it...)
Note that making align_of_val
panic also means that field access will potentially panic:
extern { type Opaque; }
struct TerribleOpaque {
a: u8,
b: Opaque,
}
let a: &TerribleOpaque = unsafe { ... };
let b: &Opaque = &a.b; // <-- this line will panic.
struct GenericThingy<T: ?Sized> {
c: u8,
d: T,
}
let c: &GenericThingy<Opaque> = ...;
let d: &Opaque = &c.d; // <-- this line will also panic.
Returning a dummy value rather than asserting/panicking seems really unlike Rust, and not something we typically do. We don't do things like returning -1
; we use Option
, or we panic or assert.
@nikomatsakis could we keep this unstable until we have a custom DST experiment then? Or we could stabilize this with unstable DynSized which would just prevent generics over extern types in practice which is probably fine, while allowing it to be removed later based on DST experiment
but these are the most general case anyway (when you have the full value + its metadata); we're covering the hard case now.
You mean custom DSTs in practice would all implement DynSized?
Given that, we will have the possibility of size_of_val panicking.
Sure any code may panic, but using a panic to enforce a static invariant still leaves a bitter taste in month. If a library makes a "false instance" that is considered bad form. This only is different because of concerns about opt-in traits which may tip the scales in aggregate but doesn't address this problem.
I want a solution that doesn't feel born out of tragic trade-offs.
@Ericson2314 The expectation from the discussion was that a lint ought to be able to catch the vast majority of such cases.
I think of 0 as a "safest possible default value" -- that is, if someone asks for the size_of_value of an extern type, gets 0, and proceeds to read and write 0 bytes to and from memory, the effect will be that of a no-op, which is unlikely to actually cause any problems.
Let's say I'm trying to serialize a value (in a generic function, as it goes) somewhere and use size_of_val
for that. Now, when I deserialize it, I have a problem.
@joshtriplett I agree, and the point of my comment was to explain why I think this is unlike that.
@whitequark Thanks. To have a problem, you'd need the deserialization code to somehow derive a different value? How could/would that end up happening? (I guess if the deserialization happens in C? But then why is the Rust code using generics to hand-roll its own serialization instead of calling C?)
(I just want to be duly diligent and identify, as an existence proof (or 'smoking gun'), at least one plausible, concrete real-world scenario where this causes a major problem before we judge that it's 'obviously' a bad idea.)
fn serialize<T>(storage: &mut [u8], val: &T) {
let size = mem::size_of_val(val);
storage[..size].copy_from_slice(slice::from_raw(val as *const T as *const u8, size));
}
fn deserialize<T>(storage: &[u8], val: &mut T) {
let size = mem::size_of_val(val);
slice::from_raw_mut(val as *mut T as *mut u8, size).copy_from_slice(&storage[..size]);
}
extern {
type Foo;
fn alloc_foo() -> *mut Foo;
}
// somewhere:
let original_foo: &Foo = ...;
let new_foo: &mut Foo = unsafe { alloc_foo() as &mut Foo };
let buf: &mut [u8] = ...;
serialize(buf, original_foo);
deserialize(buf, new_foo);
// now new_foo contains uninitialized data.
Ah I see. In that case the part which 'does know' the size is alloc_foo
, which sounds realistic enough. I'm convinced, thanks again!
Unrelatedly, I want to re-raise the question of whether we want to deprecate size_of_val
and replace it with something which returns an Option
. That would take some of the edge off of size_of_val
panicking which nobody likes.
extern type
is a special-purpose feature that exists for FFI
could we keep this unstable until we have a custom DST experiment then
The libs team hopes to stabilize relatively soon (a subset of) allocator APIs after changing them to use *mut void
to represent pointers to allocated memory, with void
(name to be bikeshedded) an extern type. The type being !Sized
is valuable to prevent the use of <*mut _>::offset
without first casting to another pointer type, but the pointers must be thin.
So while FFI was a primary motivation itβs not the only case when extern types might show up, and it would be nice to be able to stabilize them without waiting for a full design for custom DSTs.
I think of 0 as a "safest possible default value" -- that is, if someone asks for the size_of_value of an extern type, gets 0, and proceeds to read and write 0 bytes to and from memory, the effect will be that of a no-op, which is unlikely to actually cause any problems.
If you are ever asking for the size of an extern type
, something has gone awfully wrong somewhere. This size is by definition not knoweable. 0
is most certainly not a safe choice if e.g. the offset is used to get an address that definitely lives "after" the extern data in memory; the code would instead overwrite that data which is rather not a safe choice.
@whitequark how would reporting the (incorrect!) size 0
be helpful with deserialization? If you attempt to deserialize a type of which you do not know the size, and that deserialization somehow needs the size, then you are kind of in a hard place and something went wrong somewhere. "Just go on and pretend nothing happened" is not how Rust solves these kinds of problems.
@kennytm How does Rust even compute the layout of a struct like
struct TerribleOpaque {
a: u8,
b: Opaque,
}
given that rustc does not know the alignment of Opaque
either? I expect such type definitions to be illegal. And vice versa, if rustc somehow does come up with a layout and a choice for the offset of b
, then it can just use that value when doing &c.b
at run-time. Field access will never panic; it compiles (AFAIK) to a constant-offset operation because the offset of the field is computed at compile-time, never at run-time.
Field access will never panic; it compiles (AFAIK) to a constant-offset operation because the offset of the field is computed at compile-time, never at run-time.
No the offset &c.b
can be computed at run-time when the field is a DST. Check this:
let y16: &GenericThingy<dyn Debug> = &GenericThingy { c: 10u8, d: 20u16 };
let y32: &GenericThingy<dyn Debug> = &GenericThingy { c: 30u8, d: 40u32 };
assert_eq!(
(&y16.d as *const _ as *const u8 as usize) - (y16 as *const _ as *const u8 as usize),
2
);
assert_eq!(
(&y32.d as *const _ as *const u8 as usize) - (y32 as *const _ as *const u8 as usize),
4
);
Although y16
and y32
have the same type, the offset of &self.d
is different.
Currently, the offset of this DST field d: T
of a DST struct GenericThingy<T>
is computed by the compact-size-of the sized prefix, rounded up to the alignment of the DST field type T
. Therefore, to compute the offset of the field, we must require the type T
to have an alignment derivable from its metadata only. In the Custom DST proposal this means T: AlignFromMeta
.
So yes TerribleOpaque
is illegal. However,
- The definition
GenericThingy
is clearly legal, - Since there is no
DynSized
orAlignFromMeta
, there is nothing blocking us from instantiatingGenericThingy<Opaque>
- Unless you introduce post-monomorphization error,
&c.d
should have the same compile-time behavior whether it isGenericThingy<u8>
,GenericThingy<[u8]>
orGenericThingy<Opaque>
.
This means either &c.d
must panic at runtime, or choose a fallback alignment such as 1 or align_of::<usize>()
.
could we keep this unstable until we have a custom DST experiment then?
I'm not inclined to wait. As @SimonSapin said, the FFI need is real now. Also, I'd like to drill into what specific choices around Custom DST are being forced here. It seems that what we are deciding is actually relatively narrow:
Will we try to narrow the range of types on which you can invoke size_of_val
?
We are leaning towards "no" on that particular question, but that does not necessarily imply that all custom DST types must implement DynSized
. We might, for example, say that size_of_val
and friends use specialization to check for what sort of trait the reference implements (if any) and panic if there is no such trait implemented -- presuming that always applicable impls work out like I think they will, that would be eminently doable (and @aturon had an exciting idea for building on that work, too, that helps here).
Even if we did say that everything must implement DynSized
, then it seems like we are distinguishing a class of types (including at least extern type
) for which said implementations unconditionally panic. We are saying that it is not worth distinguishing that classic soundly in the trait system, but we could use lints to capture that class when generics are not involved. (And go further with monomorphization-time lints, if desired.)
Note that making align_of_val panic also means that field access will potentially panic:
Yeah, good point! This would also be a consequence of custom DST. I think strengthens the case for "hard abort" and not panic -- it seems like predicting which field accesses could panic would be quite subtle, and a potential optimization hazard. (It may also argue for a monomorphization-time lint.)
If we did opt for "hard abort" instead of panic, I would say that the rule is:
Custom DST code is not permitted to panic (much like panicking across an FFI boundary). We will dynamically capture such panics and convert them into a hard abort.
UPDATE: Ignore this, it doesn't work because of back-compat; you can do coercions in a generic context, obviously.
Hmm I wonder if we can modify the definition of CoerceUnsized
to prevent "unsizing" a final field into an extern type altogether? That would, I believe, avoid the concern about field access (and moves more towards the specialized-based interpretation of size_of_val
I proposed here). ~~
That is, we might have a trait DynSized
, which is not implemented for extern type
, and we say that you cannot use coerce unsized unless the target type implements it. But it is not required to invoke size_of_val
.
Unrelatedly, I want to re-raise the question of whether we want to deprecate size_of_val and replace it with something which returns an Option. That would take some of the edge off of size_of_val panicking which nobody likes.
I see this as a separate question, but I am sympathetic to your desire. That said, I agree also with @joshtriplett that in many cases one will just unwrap the result -- at minimum, we ought to add some sort of function to readily test if a type has a defined size/alignment, so you can code defensively (even if we are not going to say you must).
This means either &c.d must panic at runtime, or choose a fallback alignment such as 1 or align_of::().
Thanks, I clearly had not thought this through enough.
However, returning any arbitrary (and hence wrong!) fixed alignment seems catastrophic for the case you described -- it would let us compute the wrong address, as C code that knows the actual extern type could end up with a different layout than we do. So, doesn't your example show that we have to panic or abort?
I think strengthens the case for "hard abort" and not panic -- it seems like predicting which field accesses could panic would be quite subtle, and a potential optimization hazard. (It may also argue for a monomorphization-time lint.)
Agreed -- not just because of optimizations, but also because unsafe code has to be very aware where panics could be raised, for exception safety purposes.
Would these interact in any way with NVPTX extern __shared__
array types ? There are two flavors of __shared__
array types, static (non extern
) and dynamic (extern
).
fn kernel() {
let mut a: #[shared] [f32; 16];
// ^^ This array is shared by all threads in a thread-group
// It's size is fixed at compile-time and it is the same for
// all kernel invocations.
let mut b: #[shared] [u8];
// ^^ This array is shared by all threads in a thread-group.
// It has a dynamic size that is constant during the invocation
// of this kernel. Each kernel launch must set its size, but each time
// this kernel is launched this array can have a different length. This
// basically produces a pointer. The user is responsible for tracking
// the size of these arrays, e.g., by passing it as an argument to the
// kernels.
}
So it looks to me that this wouldn't interact with static __shared__
arrays because size_of_val
would just return mem::size_of
. However, the size of extern __shared__
arrays is not known, not at compile-time, and at least for nvptx not at run-time either: the user is in charge of passing the array size around as a kernel argument and it is a "common" idiom to pass a single integer from which multiple sizes are computed inside the kernel. So I assume size_of_val
would need to result in an error for these.
Field access being potentially aborting feels very sad.
On the other hand, the only way this can happen is when the struct type is uninhabitable, in which case the field access was dubious anyway. So this basically raises the old question of when is calling size_of_val
or doing raw pointer lvalue access is valid.
I would prefer that to be documented somewhere - obviously, we want to access fields of e.g. uninitialized structs, as in e.g. RcFromSlice
.
The recent DynSized
RFC proposed
T
to meanT: Sized
T: ?Sized
to meanT
with no trait boundsT: ?Sized + DynSized
to meanT: DynSized
Where extern
types would be !DynSized
, but then only adding + DynSized
to size_of_val
in the new epoch, leaving it as a lint+panic for now. Since this is an extension of what's being proposed here and could be added later, is the idea still on the table?
Since we want to compile together crates that use different epochs/editions, opting into a new edition can only affect "superficial" crate-local aspects of the language like syntax, not public APIs.
My basic concern is I feel a number of various issues are pushing us in the direction of more fundamental / opt-in traits, but the resistance to opt-in traits is such that we're throwing around ad-hoc lints ad-hoc solutions like lints instead. I get that ?
is annoying to teach, but I'll take principled weirdness over banal but endless machinations. The grapple scares me more than the fall down this slippery slope.
I'll admit {size,align}_of_val
isn't that interesting on it's own. But to show my cards, I was excited about contributing in part to this RFC because I finally had some issue by which to force the topic of DynSized
in particular, and more special traits in general. I guess you all didn't take the bait :). Now, I suppose I'll ask whether, if we had a full menagerie already, would we still bother making {size,align}_of_val
defined for !DynSized
types. Relatedly, if we end up adding DynSized
later, would be deprecate the {size,align}_of_val
we have today? I realize "no" for the first and "yes" for the second aren't ironclad reasons to make DynSized
now, but I'm still curious about the answer.
Even if we did say that everything must implement
DynSized
, then it seems like we are distinguishing
Mmm if all types must implement DynSized
, then we're not distinguishing anything. What we are doing is providing a principled way of using an existing feature (the trait system) to allow users to right the requisite "hook". That alone is reason for a DynSized
trait in my mind.
...a class of types (including at least
extern
type) for which said implementations unconditionally panic.
Surely you don't mean the salient attribute of !DynSized
types is that querying the size panics? It's that they have no dynamically or statically known size. Panicking is just an enforcement mechanism with no intrinsic meaning.
opting into a new edition can only affect "superficial" crate-local aspects of the language like syntax, not public APIs.
We could deprecate and replace size_of_val
then. Call it dynsize_of_val
.
@canndrew (I am responding to two comments at once)
The recent
DynSized
RFC proposed
T
to meanT: Sized
T: ?Sized
to meanT
with no trait boundsT: ?Sized + DynSized
to meanT: DynSized
The recent DynSized RFC proposed ... We could deprecate and replacesize_of_val
then. Call itdynsize_of_val
.
I believe that this future could still be on the table. This is what I was trying to say in this comment when I wrote:
It seems that what we are deciding is actually relatively narrow:
Will we try to narrow the range of types on which you can invoke
size_of_val
?
I feel very strongly that we do not want T: ?Sized
to actually mean T: DynSized
. However, I could imagine that we introduce DynSized
as an "ordinary" trait and introduce dynsize_of_val
(or whatever) that requires it -- and then specify that size_of_val
is implemented by using specialization to invoke dynsize_of_val
when possible and aborting/packing otherwise (I lean more and more towards abort, personally).
Alternatively, thinking more about lints -- it is certainly plausible to lint on calls to size_of_val
unless T: DynSized
(one could even imagine generalizing this). That is important because we also do have to figure out the field access question. We can't deprecate field accesses -- and they are legal today knowing only that T: ?Sized
(i.e., we do not require T: DynSized
). But we could lint aggressively there, thus encouraging T: DynSized
to proliferate.
Worth thinking over. But also not blocking further progress on extern type
, I think.
My basic concern is I feel a number of various issues are pushing us in the direction of more fundamental / opt-in traits, but the resistance to opt-in traits is such that were throwing around ad-hoc lints instead.
Can you be more explicit? It seems like this is one precise case where we are talking about lints, specifically because it is narrow and we don't see another way out of the backwards compatibility box, but in other cases where we had thought about adding "implicit" traits (notably, ?Move
), I don't believe lints are on the table. Instead, we've found a way to add the desired functionality in a "non-infectious" fashion (using Pin
). Are there other cases I'm overlooking?
That said, I do think there is a constant tension, one that Rust always has to walk: how to get the maximum bang for our static analysis buck, and I feel no shame about keeping lints as part of the toolbox.
Can you be more explicit?
Sorry, I meant "ad-hoc solutions like lints". edited the above accordingly.
I hadn't yet seen Pin
. Glad there is a safe and total way out of that corner, but it too strikes me as a bit of a monkey patch; see the final comment rust-lang/rfcs#2349 (comment) which makes one wonder whether all collections will need a Pin
variant leading to an ecosystem split!
I realize there's a steep drop off in priority along [generators, extern types, custom DSTs, out pointers and other linear types]. But the fact that implicit traits keep coming up gives me pause to let them go: I now see them all as one problem and thus our current trajectory as many unrelated piecemeal solutions. Also, the observation (not mine, maybe in rust-lang/rfcs#2255 ?) that more ?
-traits probably makes them less confusing I find compelling.
Also, ?
-traits work like Cargo features in that their the only general way to backwards-compatibly grow the language in a negative direction: reducing requirements rather than adding functionality, and I find that the more interesting direction for language evolution.
This all boils down to a difference in but opinion that's been around for years, and probably cannot be bridged. Your previous comment on positive DynSized
gives me hope in this specific case. If we have far more DynSize
than ?DynSize
annotations in the end, I wonder what is achieved, but at least we can meaningfully speak about sizing.
OK, I just want to say that I am very strongly of the opinion that Rust should use built-in traits like DynSized
to express the difference in capabilities between extern
types and the dynamically-sized types that currently exist in Rust (i.e. trait objects and slices). All of the alternatives that I have seen β panicking, returning Option<usize>
from size_of_val
, post-monomorphisation lints β are less powerful, and the issues with ?
-traits that people keep bringing up need to be tested and not just speculated about. We need to at least try doing things the builtin-traits way and see what it's like, and see what the ergonomic impact is like, and see if we can reduce it, before settling for something inferior.
Maybe I'm overreacting, I just got the sense from reading some of the comments in this thread that something might be done in order to get extern
types out the door, that might put us in a backward-compatibility trap later on. Now that I have more time to work on Rust, I'm planning on writing an eRFC to add DST-specific builtin traits like DynSized
and SizeFromMeta
, so we can start experimenting with them and Custom DST.
@mikeyhew These alternatives are definitely less powerful, but as the maxim goes: always use the smallest tool for the job.
There are a host of global factors to consider when extending the language, especially when it comes to introducing a new fundamental distinction. The payoff in this case seems incredibly tiny. And we do have plenty of first-hand experience with ?
traits in the form of Sized
.
I wonder if you could spell out, in terms of practical impact, why you feel so strongly about built-in traits?
I'll start a list.
-
Opt-out traits are less impactful for those that don't care. Don't care about weird FFI types? Never write
?DynSized
. If somebody else wants to use your crate for those, they can send theDynSize
PR. C.F. with opt-inDynSize
and deprecated{size,align}_of_val
, now everyone needs to care if the new replacement methods are to get traction. This is the exact opposite of what @withoutboats said. -
DynSize
is a minor now, but seems like an important part of any custom DST proposal. Custom DSTs are very useful for things we care about. -
Opt-out traits are like Cargo default features. They allow a completely different way of changing the library/language by reducing dependencies/assumptions instead of adding features. They are the only way to backwards compatibility do that we have, in fact.
I am personally interested in this sort of thing. It's very similar to portability, for example. We want rust crates that don't or barely need a normal OS to also support weird platforms without annoying the crate author. It is an open "ecosystem sociology" question whether this can be pulled off. Similarly a bunch of us want truly unized types, custom DSTs, linear types, out pointers, and other weird things without pissing off regular uses. Opt-out traits, again, seem the best and only way to do that.
-
I strongly agree with whoever wrote that having more opt-out traits is good for pedagogy---it was a great point that I hadn't previously thought of at all. Right now
Sized
, being the one weird trait, isn't really part of a general pattern. DSTs,Sized
, and opt-out traits are probably all one mess in most peoples head. Having more opt-out traits teases the concepts apart: who knows which opt-out trait you'll grok first, and now that can help you learn the others.I think it's illustrative that you wrote "built-in traits" above @aturon. We have many different types of magic traits today, from the most normal Copy (requires impl), to Send/Sync (implicit impl but not default bounds), and
Sized
(implicit impl and default bound). Making sure ever weird class has multiple examples and a dedicated names (better than old "OIBIT"! https://internals.rust-lang.org/t/pre-rfc-renaming-oibits-and-changing-their-declaration-syntax/3086/15) should clear things up.
DynSize is a minor now, but seems like an important part of any custom DST proposal. Custom DSTs are very useful for things we care about.
@Ericson2314 I've yet to see any custom DST proposal that involved any types whose size was completely unknown at runtime. Why is this so critical?
@cramertj Custom DSTs are DynSized
by definition. It's that implementing a trait is by far the most natural way to add the right hook. I want a repeat of Copy: Clone
not Drop
. Drop
got it wrong because as all types (today) can be dropped, the question is when is the drop automatic and when does it require user code; I'd have preferred a Forget: Destroy
.
@cramertj Also some custom types might have really expensive ways to calculate the size (C strings, for example). For performance-conscious users, it may be better to not implement DynSized
and do all size look-ups by hand. IMO, all implicit operations being O(1)
is a defensible if extreme position to take.
C.f. some people arguing similarly about lock guards being linear and needing to explicitly consume unlock
in the past (not that i necessarily agree with that lock guard example).
-
I don't think "always use the smallest tool for the job" applies here. The decreased power of lints is directly worse for uses. C.f. non-null lints v.s.
Option
in other languages. Lints are easily lost amid other warnings, and the fact is only some users will care. This means while individual code bases might obey them, the ecosystem as a whole can not be trusted to uphold the invariants the lints try to maintain. This a real loss for fostering an ecosystem ofDynSize
abstractions, or whatever the niche feature is, as for such niche things, being able to sync up few and scattered programmers and form a community is all the more important. -
Ecosystem-wide enforcement is also good for the "regular users don't need to care" goal. If some library happens to use truly unsized types, and the consumer is unaware, they could face some nasty unexpected packages. With
?DynSize
they do get bothered with a compile time error they didn't expect, but that is much less nasty to deal with with than a run-time bug. If they don't want to learn?DynSized
, they can go use a different library; better to have the opportunity to do that up front than after you're tied to the library too deeply because it took a while to excise the{size,align}_of_val
panic.
As useful as extern types would be for me, having extern types without all the proper language machinery to enforce their unsizedness would result in a partial solution that is marginally better than the current partial solution of using zero variant enums. Extern types don't even solve the real problems for me such as the inability to properly specify that a struct is opaque beyond the first few fields or that a struct ends in dynamically sized or unsized data yet has a thin pointer. I want a full comprehensive plan for how to get to a full solution to those problems. What I don't want is any partial solution being stabilized early without being part of the full comprehensive plan, because that just leads to Rust being locked into something sub par.
I wonder if you could spell out, in terms of practical impact, why you feel so strongly about built-in traits?
I want to create safe data structures for DSTs, like the DSTVec data structure that I posted about on Reddit a while ago (probably over a year ago), which stores DSTs contiguously in memory to avoid boxing. It requires AlignFromMeta
and either SizeFromMeta
or SizeFromRef
, and I'd like to be able to write those requirements as a trait bound.
A few months ago, I came up with an idea that avoids the ?
altogether. I'm referring to the idea that if a Sized
-family trait appears in the list of trait bounds, the default Sized
bound is removed. I'd like to explore that by implementing it in tree and seeing what it's like to use it.
Like @Ericson2314, I don't think we want to pick the "smallest tool for the job" here, if "smallest" means the least powerful, least general, or least extensible. The Rust team has been pretty good about having a rigorous design process, and never just adding a language feature when the tools to implement it can be added instead, and the original feature possibly added as a syntactic sugar for something more expressive. In this case, the tools we are talking about are the Sized
-family traits, and an extern
type is really just a type that doesn't implement them.
@mikeyhew Overall I very much agree with all that. One thing is though:
I'm referring to the idea that if a
Sized
-family trait appears in the list of trait bounds, the defaultSized
bound is removed. I'd like to explore that by implementing it in tree and seeing what it's like to use it.
This would require us to design the entire hierarchy at once. Because otherwise, if we add another then now that one can't be disabled by the other older ones for backwards compatible. Better to have just one notion of ?
-traits than a courser staircase thing I think.
(BTW the one notion can be thought of as just one type of ?
-trait and a "flat" default bound Size + DynSized + ...
such that any trait opted out also removes any other part of the default bound implying it.)
Note: @joshtriplett opened up #49708 to discuss the specific question of "what should the behavior of size_of_val
(and align_of_val
) be when applied to extern types" -- that is, the specific question at hand (which obviously intersects larger questions around DynSized
).
(I didn't see him announce that here.)
So, the name currently decided on by @SimonSapin for the allocated memory type seems to be Opaque
. I concur that Void
is a dubious name, but I don't think Opaque
is great, mainly due to its complete lack of descriptiveness. I propose one of the following:
Blob
Mem
MemBlob
MemChunk
OpaqueMem
β not my favourite, but at least slightly more explicit than justOpaque
I'd probably lean towards Mem
, since it's the most pithy, but the others are okay too.
@alexreg You probably meant this for #49668 (tracking the GlobalAlloc
trait) rather than this issue (tracking extern types in the language), but I personally donβt care much about the Opaque
name: I only picked one so we could make progress. In my mind these names are pretty much all synonymous to Thing
.
@SimonSapin Yes, I did. Not sure how I got here when I thought I clicked Alex Crichton's link, oops. Want me to repost my comment there, or leave it here (this issue is rather related anyway)?
Anyway, glad to hear you're not too bothered. I understand you just wanted to get that PR merged with a name that wasn't Void
, so fair enough. Perhaps we can wait for a few people to voice there preferences, and you can pick one from the above, if that's okay with you? :-)
At this point Iβd even prefer if someone else did the cat herding consensus gathering and picked one name (:
@SimonSapin @alexreg Can you please have the discussion about GlobalAlloc
naming in that issue?
I just realized extern type
can be used to easily & safely emulate single-inheritance hierarchies:
extern {
pub type Base;
}
#[repr(transparent)]
pub struct Derived(Base);
impl Deref for Derived {
type Target = Base;
fn deref(&self) -> &Base { &self.0 }
}
impl Derived {
pub unsafe fn unchecked_downcast(base: &Base) -> &Self {
&*(base as *const Base as *const Self)
}
}
Of course, using this requires using references in FFI, but that was already part of the plan for the usecase where I noticed this would be an option, i.e.:
Lines 463 to 479 in 01dbfda
cc @rust-lang/lang Nominating for discussion this week, as to what's needed for potential stabilization. If someone has time to summarize the discussion so far ahead of the meeting, that'd be most helpful!
Another technique using this feature:
extern { type Opaque; }
struct InvariantOpaque<'a> {
_marker: PhantomData<&'a mut &'a ()>,
_opaque: Opaque,
}
pub struct Foo<'a>(InvariantOpaque<'a>);
This is sort of a hacky way of adding a lifetime parameter to an extern type
(which we could also natively support, I suppose, but as you can see, it can be emulated; type parameters would also work).
What can we do with this?
- an "owned" pointer, akin to C++
unique_ptr<Foo<'a>>
, is&'a mut Foo<'a>
- the constructor can mark with
'a
the lifetimes that foreign code would capture - if a constructor returns this type and a destructor takes it, the latter can't ever receive a reborrow, because borrows are necessarily
&'b mut Foo<'a>
with'b
strictly shorter than'a
, and'a
can't be shortedned because it's in an invariant position - therefore,
&'a mut Foo<'a>
can only be moved (when written with two'a
s like that) - caveat 1: can't implement
Drop
on&'a mut Foo<'a>
, need a wrapper struct - caveat 2: using it as a field in a wrapper that implements
Drop
means having to use&mut *(self.field as *mut _)
becauseDrop
impls can't move out their fields
- the constructor can mark with
- most functions can take
&Foo
(or maybe even&mut Foo
, but that's your choice) - if you need to refer to what the foreign object borrowed, use
&Foo<'a>
- e.g.
fn foo_field(foo: &Foo<'a>) -> &'a Bar;
- e.g.
- if you need to refer to the foreign object's own lifetime, use
&'a Foo
- e.g.
fn foo_iter(foo: &'a Foo) -> &'a mut FooIter<'a>;
- e.g.
Extern types don't even solve the real problems for me such as the inability to properly specify that a struct is opaque beyond the first few fields or that a struct ends in dynamically sized or unsized data yet has a thin pointer. I want a full comprehensive plan for how to get to a full solution to those problems. What I don't want is any partial solution being stabilized early without being part of the full comprehensive plan, because that just leads to Rust being locked into something sub par.
@retep998 Have you tried creating a struct with an extern type
as its last field?
@irinagpopa has used the technique from my previous comment in #52461 with great success.
EDIT: an earlier example of success use of prefixes, this one even has data:
Lines 594 to 609 in f686885
What's the status of the compile-time detection prohibiting size_of_val
on an extern type?
I was going to fix rust-lang/nomicon#29 within the next few days by changing that section to recommend a ZST instead. Should I wait because this might get stabilized soon-ish? :D
@joshtriplett AFAIK no lint or panicking size_of_val
solution has been implemented.
If @aturon is fine with it (iff we want this as part of the edition), I could try implementing those.
@nikomatsakis @oli-obk Do you have opinions on whether to have a check within the size_of_val
safe wrapper for the intrinsic, or in the intrinsic itself?
I think we can emit panics anywhere calls are involved, since we have the unwind edges and whatnot fully set up (TerminatorKind::Assert
already works a bit like this), so we should be able to, with minimal work, handle the size_of_val
intrinsic panicking inside codegen.
But miri also needs to handle it, so maybe we should add a miri error kind and use that like Assert
?
I would have thought we'd just replace calls to the intrinsic with an assert false
if the type is an extern type. So "inside the wrapper"?
We currently don't support repr(align(N))
on extern types (it's not prohibited but it's ignored -- maybe we should fix that), but it could possibly be useful. The alignment is a property of pointers, so it can be known even if the pointed-to type is opaque. For example, GCC lets you do struct __attribute__ ((aligned (8))) S;
.
But if we support that, then it would make sense for align_of
andalign_of_val
to return the declared alignment rather than panic (unlike dyn Trait
, an extern type is not encompassing multiple different concrete types with different alignment requirements). I think it's backwards compatible to go from panics to this interpretation later, and it doesn't change the story for size_of_val
, but @eddyb asked me to write this up for the record.
it's not prohibited but it's ignored -- maybe we should fix that
Yes, Iβd even say this should be a blocker for stabilization.
Just to remind that &self.extern_type_field
invokes the align_of_val
intrinsic indirectly to calculate the field offset, so the assert(false)
shouldn't be placed in the safe wrapper, but the intrinsic itself.
@kennytm the field access was the reason I brought up that we shouldn't panic. The intrinsic and the wrapper are function calls, which means there's a target for unwinding. But we don't have the same kind of information for field accesses, so they would create serious problems by panicking.
This is why I prefer returning a defined alignment.
@eddyb unless the extern_type
supports #[repr(align(n))]
I don't think returning any fixed number from intrinsics::align_of_val()
is safe π.
Without #[repr(align(n))]
, instead of panic an abort()
is probably more appropriate for field access.
Together, the implementation could be:
fn size_and_align_of_dst(...) -> (ValueRef, ValueRef) {
...
match t.sty {
ty::TyForeign(_) => (C_usize(cx, 0), bx.trap()),
// make sure intrinsics::align_of_val() will abort the program.
// (size_of_val() can be anything, no one should call it anyway.)
...
}
}
pub fn size_of_val<T: ?Sized>(val: &T) -> usize {
unsafe {
if intrinsics::is_extern_type::<T>() { // if you need a panic
panic!("extern type does not have a known size");
}
intrinsics::size_of_val(val)
}
}
Should #[repr(align(n))]
be required for extern types?