rust-lang/rust-memory-model

Compiler optimizations considered harmful in safe function in "unsafe module"

Closed this issue · 20 comments

Consider the following contrived module:

mod strange_unsafe {
    use std::ptr;

    struct StrangeUnsafe {
        magic: i32, // invariant:  This is always 42
        strange: *mut StrangeUnsafe, // this points back to ourselves (we only ever occur in a Box, so that's possible)
    }
    
    impl StrangeUnsafe {
        fn check(s: *mut StrangeUnsafe) {
            assert_eq!(unsafe { (*s).magic }, 42);
        }
    }
    
    pub struct PubStrangeUnsafe(Box<StrangeUnsafe>);
    
    impl PubStrangeUnsafe {
        pub fn new() -> Self {
            let mut s = Box::new(StrangeUnsafe { magic: 42, strange: ptr::null_mut() });
            let addr = &mut *s as *mut _;
            s.strange = addr;
            PubStrangeUnsafe(s)
        }
        
        pub fn do_sth(&mut self) {
            self.0.magic = 13;
            self.0.magic = 42;
            StrangeUnsafe::check(self.0.strange);
        }
    }
}

Playpen

What is strange or troublesome about this? do_sth looks like a perfectly safe function -- it's not marked unsafe and doesn't contain an unsafe block -- so we may think that we can do the usual optimizations that we think we can do in safe functions. However, if the compiler decides to move the store of 42 down below the function call, the module's invariant is violated in a way that the module can observe. That's bad.

One may say the actual cause of the trouble here is the fact that check is not marked as unsafe -- after all, that function doesn't actually work with any odd raw pointer. However, check is internal, and it seems to be a long stretch to rely on everyone annotating their purely internal helper functions properly.

Cc @nikomatsakis @arielb1

@RalfJung that's undefined behavior - you're aliasing mutably with a reference. The subobject is referenced by self, and you write to it - thereby saying "I promise that from this point forward, I am the only pointer to this that shall be read or written to, until I stop being used." Therefore, this is invalid in both unsafe and safe code. You're breaking the invariants of references.

At least, that's my opinion of it.

That's one possible answer, but I feel that could actually rule out real-world unsafe code. Also, specifying exactly when "I stop being used" is happening doesn't seem an easy task. Once you are saying that it matters which of multiple "identical" pointers you are using to access a given location, that opens a whole can of worms.

Also, saying just "from this point forward" would not permit moving loads/stores up across function calls. Seems like an access should have an effect "backwards in time"?

@RalfJung I need to re-write down the wording that I used. Basically, in this case, since neither pointer is derived-from the other, it is known that they do not alias.

and yes, it does open a can of worms - it's very important to open this can of worms. What about this example is different from the (definitely UB) example following?

fn check(y: *mut i32) {
  assert!("", unsafe { *y } == 42);
}

fn foo(x: &mut i32, y: *mut i32) {
  x = 0;
  x = 42;
  check(y);
}

let mut int = 0;
let ptr = &mut int as *mut _;
foo(&mut int, ptr);

The fact that my example involves an "unsafe type" that has its own invariant attached to it -- an invariant that makes a naive interpretation of my code work fine -- could make a difference.
It just seems really daring to me to rule out such code, because I expect things like that to easily arise when writing a safe abstraction involving raw pointers that do alias with some of the references that are handed to the outside world.

Imagine some form of tree where every node has a raw pointer back to the root. Some public method in this library could take the entire data structure via &mut, and then invoke operations on the tree that do involve the raw pointers. That should be possible, and it shouldn't be too hard to get it right. In such a library, I don't think we want to optimize loads or stores across function calls, that just sounds too dangerous.

@RalfJung it may have to be a special case - pointers that live inside the object you have a reference to.

(FWIW, I've written code that is very close to this in Rayon -- but not the same. In particular, the code had a "self-link" of this kind, but that self-link is an Arc<T>. This creates an (intentional) reference cycle that gets broken at some point by clearing the self-link, allowing the arc to be collected.)

This is a good example though that gets right at the question of what the boundary for unsafe ought to be. I really feel like we need a way to make that boundary clearer -- this might be unsafe fields, I'm not sure -- and that it is inevitable that such boundaries will play a role in optimization. I still favor the idea that the boundary for unsafety follows the privacy rules, and that we add some sort of level (unchecked) which means "there may be safety invariants, but all the types have their usual meanings". This would be suitable for e.g. <[T]>::get_unchecked or String::from_utf8_unchecked.

@nikomatsakis I still disagree that unsafe should make anything special - all code should have the same invariants, and, imo, unsafe should only be used as a lint. Otherwise, I think people will be confused that their code works in unsafe, but not in safe.

But it won't even compile in safe -- the compiler will complain that you are performing unsafe operations.

@RalfJung I'm talking about stuff that's in your example above.

So am I. If we say that unsafe makes the entire surrounding module special, then you won't have that problem where something silently stops working if you make it safe. It will instead stop compiling.

@RalfJung that both seems like a massive pessimization, and only moves the problem. Now it's "if someone creates an implementation module".

It's not "just" moving the problem -- being in a separate module makes the privacy rules kick in.

@RalfJung that doesn't really matter for internal modules? You can have intermodule invariants.

Hm, yeah, you can. Maybe we require some annotation if that's what you do? Not sure. I don't claim I have all the answers. ;)

Do you have an example of a real crate doing something like this?

@RalfJung no, but I don't write very much unsafe code. Maybe something like rayon? None of this seems like super unreasonable code.