Clarify which inputs are permitted for the offset intrinsic (and the MIR binop)
Closed this issue · 10 comments
The offset intrinsic compiles to LLVM getelementpointer with inbounds attribute, so the offset must stay in-bounds (or one-past-the-end). However, rustc currently also relies on "offset by 0" to always be legal, even if the pointer passed as input is dangling:
- In the IntoIterator impl for slices. If the slice is empty, the pointer will be
(void*) alignment, but the iterator will still add the length (0) to it. - The MIR emitted as slice drop glue does the same, but uses the MIR binop instead of the intrinsic.
So the rules for offset currently seem to be:
- The resulting pointer has to be a valid, in-bounds pointer or one-past-the-end, OR
- the offset is 0.
Note that, in the model I'm thinking about, any non-null pointer with the correct alignment should be valid for zero sized types.
That's fair enough. We also have to be careful to emit the right code for LLVM. One consequence of this could be that in codegen, when we emit GEP, we don't set "inbounds" when we are working on a ZST.
ZSTs are weird, and I'm not sure what llvm would want from them. Having inbounds on a ZST pointer shouldn't be an issue though, because the point of inbounds (afaik) is to keep aliasing data, and ZSTs are impossible to alias.
We should just better be sure LLVM doesn't derive UB from "you said this was inbounds, but I can prove it is not".
yeah, agreed.
It turns out rustc stretches the rules even more: Below https://github.com/rust-lang/rust/blob/445077963c55297ef1e196a3525723090fe80b22/src/libcore/slice/mod.rs#L777, the get methods defer to .offset to compute the address of an element of the slice. This offsets a dangling-pointer-to-ZST by an arbitrary user-controlled amount.
No idea if LLVM permits this. Maybe offset is fine anyway if either the offset or the size is 0 (they are multiplied, after all).
offset on what is llterally a ZST is a nop. We don't even have to lower it to LLVM if it turns out there's a problem.
Also see @arielb1's comment at rust-lang/rust#42789 (comment)
And another one: RawTable::raw_bucket_at will effectively call (0 as *mut u8).offset(0). The LLVM docs make it sound like that should be legal:
The only in bounds address for a null pointer in the default address-space is the null pointer itself.
Closing in favor of younger issues: rust-lang/unsafe-code-guidelines#93, rust-lang/unsafe-code-guidelines#299