rust-lang/rust

Tracking Issue for allowing zero-sized memory accesses and offsets

Closed this issue · 11 comments

This issue tracks implementing the t-opsem decision in rust-lang/unsafe-code-guidelines#472. This will require adjustments in many places (codegen, Miri, library docs, reference, ...). The intention is to track here what needs to be done until the transition is complete.

Implementation history

bjorn3 commented

cg_clif accepts ZST memory accesses and pointer offsets already. Pointer offsets are implemented as integer addition which doesn't have UB and ZST memory accesses never get turned into loads and stores in cranelift ir as there is no instruction that does so.

memory accesses never get turned into loads and stores in cranelift ir as there is no instruction that does so.

Besides direct accesses, the other concerns are the copy, write_bytes, compare_bytes intrinsics. Those must be implemented in a way that they are not UB when elem_count*elem_size is 0.

They are implemented by calling the respective libc functions which LLVM already expects to accept 0-sized accesses, right?

GCC codegen might also need updating, Cc @antoyo @GuillaumeGomez

They are implemented by calling the respective libc functions which LLVM already expects to accept 0-sized accesses, right?

Well what LLVM assumes doesn't matter for the cranelift backend, does it? ;) But more importantly, Rust explicitly assumes this itself as documented here.

No problem. Please ping us when we need to update our part and thanks for the ping!

Well I'm asking you if you need to update anything. :) You need to make sure that the Offset MIR binop is compiled in a way that offset by 0 bytes is always Defined Behavior even if the pointer operand is null or dangling or out of bounds or whatever.

I think zero-sized memory accesses disappear in the SSA codegen infrastructure before your backend even sees them so they should be fine.

And finally the copy, copy_nonoverlapping, write_bytes, compare_bytes intrinsics need to be lowered in a way that they are Defined Behavior when the size is 0, even if the pointers are null or dangling or whatever.

These intrinsics are implemented by calling the GCC builtin functions: memcmp, memset, memcpy, memmove.
I'll double-check, but it seems fine to have a count of zero, but not NULL pointers.

Okay, something needs to change then in the backend because we'll allow null pointers for the Rust intrinsics.

I updated #117329 to make it ready to land.

@rust-lang/opsem to address this concern, I made offset_from have library UB but not language UB on two pointers with the same address but provenance for different allocations. Please let me know what you think.

I created a reference PR at rust-lang/reference#1541. That should complete the implementation of this feature. :) (Aside from the GCC backend, which is tracked separately at rust-lang/rustc_codegen_gcc#516.)