rust-lang/rfcs

Feature: unchecked access to enum interior

Opened this issue · 14 comments

I'll use Option as an example, but this applies to any enum:

Sometimes you may have a *mut Option<T>, and want to get a *mut T from that, without dereferencing the pointer, and without checking whether it is actually a Some(T). AFAICT, this is currently impossible to do in rust, unless you have some way to construct a valid T (which would allow calculating an offset before-hand and then using pointer arithmetic).

Using mem::uninitialized(), or mem::zeroed() to construct a T before-hand to calculate the pointer offset doesn't work, because of enum layout optimization, which means it could be UB.

This came up when writing some concurrency code: in this case de-referencing the pointer would be invalid because the memory may be being concurrently written to. The unsafe code is able to be correct because it checks at run-time that it has exclusive access, and that it was indeed a Some(T) before it ever tries to dereference the pointer.

Does this work for you? https://docs.rs/unreachable/0.1.1/unreachable/trait.UncheckedOptionExt.html

Under the hood this uses a match expression which in the None case unsafely creates a &Void (with enum Void {}) and matches on that. The optimizer assumes that this code is "impossible" (unreachable), and should then eliminate the branch in the original match expression.

If you want to make this slightly less dangerous, https://crates.io/crates/debug_unreachable adds a check in debug mode only.

@SimonSapin I can't use that directly since I need a pointer to the interior, and dereferencing the initial pointer at all is UB. A similar approach might work in release mode, as a way to calculate the pointer offset before-hand, but I'd need it to work in debug mode too (also, relying on compiler optimizations for correctness is not ideal!)

Sorry, I don’t understand: why is it UB to dereference that *mut Option<T> pointer if it’s not UB to do anything at all with it?

The timeline looks approximately like this:

  • Read *mut Option<T> from atomic variable
  • Take *mut Option<T> and convert it to a *mut T
  • Try obtaining exclusive access to the data, via an atomic compare-exchange
  • If successful, dereference the *mut T
  • If failed, throw away the *mut T and try again

Until I know I have exclusive access to the pointer, it would be UB to try dereferencing it (data race) but I can't wait until I've got exclusive access to convert the pointer.

Afaik, there is not necessarily any valid pointer to the interior of an Option<NonZero<_>>, which includes pointer types like Option<Unique<_>> and Option<Shared<_>> and likely anything built using them.

Can you build up a custom trait that provides this pointer when it exists? Roughly :

unsafe trait InteriorRef {
    type Interior;
    unsafe fn interior_mut(*mut Self) -> *mut Interior;
    unsafe fn interior(*cost self) -> *const Self {
        interior_mut( ::std::mem::transmute::<&Self,&mut Self>(self) )
    }
}

unsafe impl<T> InteriorRef for Option<T> where T: Copy+Default {
    type Interior = T;
    pub fn interior_mut(*mut s) -> *mut Interior {
        let x = Some(<T as Default>::default());
        let Some(mut ref y) = x;
        let o: usize = (y as *mut T as usize) - (&x as *mut Self as usize) + (s as usize);
        o as *mut Interior;
    }
}

@burdges There is always a valid pointer to the interior of an Option<T> - even with layout optimization, the Some(T) case is guaranteed to have a T as part of its layout. This is how Option::as_ref() works.

I found a (horrible) workaround for my specific Option case:

fn unwrap_unchecked<T>(x: *mut Option<T>) -> *mut T {
    let offset = mem::size_of::<Option<T>>() - mem::size_of::<T>();
    (x as usize + offset) as *mut T
}

Obviously this will fail if rust ever adds more advanced layout manipulations, but hopefully this issue is resolved properly before then...

@Diggsey
Hi, sorry, we already have them. They're just off by default at the moment, but won't be for much longer.

They'd be on, but a ton of personal things came up, so the last pull request that actually enables it isn't in yet.

@camlorn In that case, the only option left would seem to be to create a bit pattern which is likely to be within the domain of T, (eg. 0x80808080...) and construct a Some::<T> from that, and use that to determine the enum layout beforehand via Option::as_ref().

comex commented

That sounds like a pretty horrific hack :)

Have you considered replacing your use of Option with a union?

edit: This would be a nice feature to have though.

eddyb commented

My take on this is that even a really cut down version of #1450 would be enough here, and that's more likely to get in (the implementation is already there in part, as MIR requires downcasts to variants) than any hack.

Using mem::uninitialized(), or mem::zeroed() to construct a T before-hand to calculate the pointer offset doesn't work, because of enum layout optimization, which means it could be UB.

Could someone explain that part in more detail? Is it UB to construct such a pointer even if it's never dereferenced? Or would it have to get dereferenced in some way?

Edit: Hmm, I think I understand part of the problem. Could something like this be correct?

fn option_offset<T>() -> usize {
    // Avoid creating an uninitialized Some if the null pointer optimization is
    // in effect, because it's not actually guaranteed to be Some.
    if mem::size_of::<T>() == mem::size_of::<Option<T>>() {
        return 0;
    }
    let dummy: Option<T> = unsafe { Some(mem::uninitialized()) };
    let dummy_ptr = &dummy as *const Option<T> as usize;
    let interior_ptr = dummy.as_ref().unwrap() as *const T as usize;
    mem::forget(dummy);
    interior_ptr - dummy_ptr
}