Differences between `*const T` and `*mut T`. Initially `*const T` pointers are forever read-only?
thomcc opened this issue · 27 comments
I hadn't seen this but it's very surprising and should be documented better. Apparently, *mut T
and *const T
aren't equivalent — if the raw pointer starts out as a *const T
it will always be illegal to write to, even nothing beyond the pointer's "memory" of its initial state is the reason for this.
See: rust-lang/rust-clippy#4774 (comment)
This is not entirely correct... something like &mut foo as *mut T as *const T as *mut T is entirely harmless. What is relevant is the initial cast, when a reference is turned to a raw pointer. I think of the pointer as "crossing into another domain", that of uncontrolled raw accesses. If that initial transition is a *const, then the entire "domain" gets marked as read-only (modulo UnsafeCell). The raw ptrs basically "remember" the way that the first raw ptr got created.
This is extremely surprising, as lots of documentation and common wisdom indicates that *const T
vs *mut T
are identical except as a sort of lint, and that the variance is different.
In fact, often having correct variance in your types often forces using *const
even for mutable data (hence NonNull uses const). The prevalence of this certainly helps contribute to programmers belief that there's no meaningful difference, you just have to be sure the data you write to is legal for you to write to.
A common case where this happens is if you write a helper method to return a pointer, you might write this just once for the *const T
case, and use it even if you're a &mut self
and need a *mut T
result. I wouldn't think twice about this, mostly because the myth they're equivalent is so widespread
Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :
I agree. *const T and *mut T are equivalent in terms of UB.
More broadly, nowhere in the thread does the 'once *const
, always *const
' behavior come up, just that you need to make sure that you maintain the normal rust rules (e.g. the rules which would apply had you started out with a *mut T
).
I looked and in none of Rust's reference material could I find any mention of behavior like this. This is very surprising, and I had been under the impression that optimising accesses to raw pointers wasn't beneficial enough for Rust to care strongly about them.
I also think this breaks a lot of existing unsafe code given how widespread the belief that they are equivalent is, and makes non-const-correct C libraries much thornier to bind to :(
if the raw pointer starts out as a *const T it will always be illegal to write to
More broadly, nowhere in the thread does the 'once *const, always *const'
These statements aren't true. the only thing that matters is how did you get the pointer, for example this is 100% correct rust code:
let mut v = 5u8;
let ptr: *const u8 = unsafe {std::mem::transmute(&mut v)};
unsafe {*(ptr as *mut u8) = 7;}
assert_eq!(v, 7);
So even though it started as a *const u8
you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference
This is, I think, a duplicate of #227. I agree it is a problem. I just do not know a good solution.
So even though it started as a *const u8 you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference
The subtle aspect of this is that x as *const _
is basically the same as &*x as *const _
, i.e., as *const _
always goes through a shared reference.
The subtle aspect of this is that
x as *const _
is basically the same as&*x as *const _
, i.e.,as *const _
always goes through a shared reference.
Ohhh that's what he was talking about, I'm sorry I misunderstood you @thomcc
I hope I'm not off topic. I don't think it is, since NonNull
is a wrapper over *const
NonNull
documentation states:
Notice that NonNull has a From instance for &T. However, this does not change the fact that mutating through a (pointer derived from a) shared reference is undefined behavior unless the mutation happens inside an UnsafeCell. The same goes for creating a mutable reference from a shared reference. When using this From instance without an Unsaf:eCell, it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation.
I've also found this post which basically asserts that it is entirely ok to use NonNull
in FFI. If pointer is nullable then I see no special benefit in using Option<NonNull<T>>
, I would just use *mut T
. However, I'm interested to use NonNull<T>
for non nullable pointers(i.e pointers for which C documentation explicitly states null value must not be provided) as it would provide additional type safety. This is just for the scenario where Rust code is calling C code, not vice versa.
And, now I'm confused :)
hm, we could say that this isn't a NonNull
related issue. We can still have the same issue in FFI in this scenario:
let x = [1, 2, 3];
let y = c_fun_which_takes_ptr_and_mutates_it(x.as_ptr() as *mut _)?;
this is said to be a UB as well
therefore I find that using NonNull<T>
is as good as using *mut T
considering the risk of UB. Maybe it's a little better since documentation states the risk of UB
AIUI, this actually has little to do with *const
vs *mut
and is about whether the pointer provenance is a &
or a &mut
(or no provenance).
The only tricky part is the point @RalfJung mentioned when casting directly from a &mut
to a *const
where a reference is implicitly created.
One option would be to warn on this direct cast (&mut
-> *const
) (in the next edition if that would be too noisy) and require that the &mut
-> *mut
-> *const
vs &mut
-> &
-> *const
path is explicitly distinguished.
Then you can be safe in treating *const
and *mut
the same.
Could we change &mut T as *const T
to not go through a shared reference? Getting rid of the implicit footgun.
The tricky bit would be to keep this code working:
fn main() {
let x = &mut 0;
let shared = &*x;
let y = x as *const i32; // if we use *mut here instead, this stops compiling
let _val = *shared;
}
Currently this works because x as *const _
is considered a read-only access.
OTOH, we do reject the as *mut
version of this. If we want to treat as *mut
and as *const
the same, accepting one and rejecting the other makes little sense.
How does addr_of!()
affects this? This code:
let mut x = 0_i32;
let ptr_x: *const i32 = std::ptr::addr_of!(x);
let mut_ptr_x: *mut i32 = ptr_x as _;
unsafe { *mut_ptr_x = 2; }
creates a *const i32
without creating &i32
first and currently triggers Miri. addr_of!()
documentation doesn’t mention that resulting pointer can’t be casted to *mut T
and used for writes though.
creates a *const i32 without creating &i32 first and currently triggers Miri
Indeed, that's how it currently affects addr_of.
addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.
True. It also doesn't say that you can do that. The docs are not exhaustive for what you cannot do. (That would require infinitely large docs.)
There has not been a decision on what the semantics should be here, and that's why the docs basically don't talk about this. It's not great, but absent a decision it's also not clear what else to do. And making the decision without having an entire aliasing model for all the context isn't really a good idea either.
It also doesn't say that you can do that.
It’s true. The way “validity” is currently defined in the standard library docs doesn’t guarantee that any use of pointers from addr_of!()
(or addr_of_mut!()
, for that matter) is valid.
Is there a rationale for making addr_of!()
-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const _
and *mut _
raw pointers are interchangeable.
Is there a rationale for making addr_of!()-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const _ and *mut _ raw pointers are interchangeable.
If you want to write through the pointer you would use addr_of_mut!()
, right? Otherwise what is the point of having two separate macros?
addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.
It actually does under the examples section of the addr_of!()
documentation:
See
addr_of_mut
for how to create a pointer to unininitialized data. Doing that withaddr_of
would not make much sense since one could only read the data, and that would be Undefined Behavior.
Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :
I agree. *const T and *mut T are equivalent in terms of UB.
I feel quoted out of context here -- for the question raised in that particular thread, my statement holds true. But specifically when converting a reference to a raw pointer, there is a difference.
Is there a rationale for making addr_of!()-produced pointers invalid for writes?
Basically, because it matches what the borrow checker does -- see here further up this thread.
Some updates on this:
- With Tree Borrows,
as *const T
andas *mut T
behave exactly the same, fixing the surprise that triggered this issue. - It turns out that at least one analysis in rustc actually did assume that "initially
*const
" pointers are not used for mutation; see rust-lang/rust#111502. - It also turns out some people actually prefer the SB behavior over TB here: having a raw pointer's mutability determined when it initially crosses from safe land to unsafe land. I agree with @thomcc and everyone else who was surprised by this over the years -- while this model can be rationalized well, it is also almost never what people intuitively expect, so IMO we should avoid it unless there are other major reasons to build things like that. Operationally,
*const
and*mut
should not make a difference. If we truly want to makelet
bindings (withoutUnsafeCell
) immutable, we should achieve that based on the mutability of the binding, not the syntax of the cast. Of course they are not actually immutable since their initial value has to be written in after they are allocated... but I don't think we want thatmut
inlet mut
to be any more than a type system hint that prevents bugs.
To make sure it's remembered, there is some practical justification of as *const _
/addr_of!
and as *mut _
/addr_of_mut!
behaving differently — they're treated differently by the borrow checker. The *mut
version is checked as a mutable access, and the *const
version as immutable.
example
let mut x = &mut 5;
let r = &x;
let _ = x as *mut _;
// ^ERROR: cannot borrow as mutable ... also borrowed as immutable
let _ = x as *const _;
// allowed
let _ = addr_of_mut!(x);
// ^ERROR: cannot borrow as mutable ... also borrowed as immutable
let _ = addr_of!(x);
// allowed
dbg!(r);
This doesn't mean that the opsem has to match this and create a pointer with shared provenance for the *const
constructions1, but it does provide a potential justification.
As long as providing derived mut provenance to the pointer doesn't impact the validity of extant provenance until the pointer is accessed, though, I agree that the more permissive model of giving the mut provenance when possible is desirable. (If the more permissive semantics are a pessimization to some code, it can probably be rewritten to introduce a shared reborrow and limit the provenance explicitly. Plus, managing two distinct simultaneously valid sibling raw provenances (one mut and one shr) seems like a nightmare.)
Footnotes
-
At least at some point, the compiler interpreted
type_ascribe!(x, &mut _) as *const _
as going through an intermediate coercion to&_
which does limit to shared provenance while that's still the case. ↩
I opened #400 for the specific question of whether let
-bound variables should be UB to mutate.
@RalfJung the one thing I will point out here is that it does not apriori have to be the case that for r: &mut u8
, r as *const _
and addr_of!(*r)
have to do the same thing. Maybe they should, but I wouldn't be terribly shocked if the slightly different syntax led people to have different expectations
I think it would be very surprising if those two ways of turning a mutable ref into a raw ptr would not do the same thing -- I feel fairly strongly they should be the same.
However I can see the question of mutation of let
-bound variables being separate from that of mutating through &mut to *const
-cast pointers. Hence the separate issue for the former.
Users seem to have some kind of intuition that the expression inside addr_of{_mut}!
is some kind of special context that provides waves hands simpler/less-UB semantics. I think this is a UI issue with it being a macro instead of what it expands to. I think it is quite important that we eventually deprecate the macro and have an operator that does the job (like &raw
), it would be a great shame if we acquire baggage due to the way we got to a stabilized &raw
.
In the indefinite future, we should have a stable #![no_core]
and when that is stable, not having access to the addr_of!
semantics in it may be acutely painful; addr_of!
is exactly the flavor of low level operation I expect to be common in core-less code.
Two small potential arguments for addr_of!
not providing write-capable provenance:
&raw const place
has a bit more of a "don't write through this" feeling thanaddr_of!(place)
does, and definitely more than ref/pointer coercion&mut place as *const _
(which doesn't even cause an unusedmut
lint).- Closure capture rules mean that
addr_of!(capture)
still captures by-ref
, which results in generating a write-incapable pointer as it gets derived from theref
-capture.- Prior to edition 2021,
addr_of!(place.field)
captures (and thus reference retags) the entireplace
. In edition 2021 and later, each field is captured independently (meaning the rest ofplace
doesn't get retagged by the capture).
- Prior to edition 2021,
For full clarity, I am fully in support of preferring &mut place as *const _
being ptr::from_mut(&mut place).cast_const()
and not ptr::from_ref(&mut place)
. (It is currently more accurately &raw const *&mut place
.) This is only about addr_of!
/&raw const
.
And I still think I weakly favor &raw const
getting write-capable provenance, because all else being equal, more things being DB and a simpler specification is better. I just think that these observations are interesting to consider.
Because OTPT is nowhere near, I think this is an argument for stabilizing &raw const
and &raw mut
. Once they're actually available we can see how people actually expect them to behave.
Closure capture rules mean that
addr_of!(capture)
still captures by-ref, which results in generating a write-incapable pointer as it gets derived from the ref-capture
This sounds like a foot gun. I would have expected it to be captured by raw pointer. But that seems off topic here. I'll open an issue in the main rust repo after investigating this.
TBH, I wouldn't expect a whole new capture mode here.
Changing &T
-> *const T
would alter type checking rules in language-visible ways, not just operational semantics. It's almost certainly a breaking change (because of auto traits).
Closure capture is an interesting one. The consistent capture mode would be &mut
, but that's probably also surprising.
But really the main point to me is that &raw const *raw_mut_pointer
(and raw_mut_ptr as *const _
, which compiles to the same MIR) should not lose an existing write permission -- I assume we have consensus on that? Having &raw const
do different things to the permission depending on the shape of the place expression that follows is a non-compositional nightmare (and I've had to spend my share of time just dealing with that nightmare in Miri; it's particularly bad for Box
).
I'm curious... what is the point of having two types *const T
and *mut T
if they behave the same way?
As a developer, if I call a function that takes *const T
, I expect that function to never change the value of that variable, even if my original variable is mutable.
Well, it's intended as an indicator to programmers mostly.
Using a &mut
capture would also alter well-formedness (degrade from Fn() to FnMut(), and also borrow the type mutably).
We also have *const i32
and *const u32
even though they behave in the same way -- or rather, opsem doesn't make a difference between them. Both the pointee type and mutability are hints for the intended use of this pointer, but not hard guarantees/constraints.