memorysafety/rav1d

Safe functions cannot impose constraints on callers

Opened this issue ยท 7 comments

...and be sound, anyways.

To demonstrate this, I recommend you add the following unit test to src/disjoint_mut.rs and count with me:
that famous purple vampire Muppet, Count von Count

#[test]
fn count_von_disjoint_mut() {
    // one, one little vector, ah ha ha ha ha!
    let v = vec![1, 2, 3, 4, 5]; // five, five little integers! ah ha ha ha ha!
    let disjoint_slice = DisjointMut::new(v);
    // one, one &mut to the first index, ah ha ha ha ha!
    let mut ref_1_a = DisjointMut::index_mut(&disjoint_slice, 0);
    // two, two &mut to the first index, ah ha ha ha ha!
    let mut ref_1_b = DisjointMut::index_mut(&disjoint_slice, 0);
    // one, one write to the first index, ah ha ha ha ha!
    *ref_1_a.deref_mut() = 5;
    // two, two writes to the first index, ah ha ha ha ha!
    *ref_1_b.deref_mut() = 10;
    // one, one undefined behavior from safe code, ah ha ha ha ha!
}

This code is rejected by both Miri's stricter and more permissive experimental borrowing models:

cargo miri test --lib count --release
running 1 test
test src::disjoint_mut::count_von_disjoint_mut ... error: Undefined Behavior: trying to retag from <188766> for Unique permission at alloc242642[0x0], but that tag does not exist in the borrow stack for this location
   --> src/disjoint_mut.rs:163:9
    |
163 |         self.slice
    |         ^^^^^^^^^^
    |         |
    |         trying to retag from <188766> for Unique permission at alloc242642[0x0], but that tag does not exist in the borrow stack for this location
    |         this error occurs as part of retag at alloc242642[0x0..0x4]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <188766> was created by a Unique retag at offsets [0x0..0x4]
   --> src/disjoint_mut.rs:375:23
    |
375 |     let mut ref_1_a = DisjointMut::index_mut(&disjoint_slice, 0);
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <188766> was later invalidated at offsets [0x0..0x4] by a Unique retag
   --> src/disjoint_mut.rs:328:30
    |
328 |         let slice = unsafe { &mut *index.get_mut(self.as_mut_slice()) };
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE (of the first span) on thread `src::disjoint_m`:
    = note: inside `<src::disjoint_mut::DisjointMutGuard<'_, std::vec::Vec<i32>, i32> as std::ops::DerefMut>::deref_mut` at src/disjoint_mut.rs:163:9: 163:19
note: inside `src::disjoint_mut::count_von_disjoint_mut`
   --> src/disjoint_mut.rs:379:6
    |
379 |     *ref_1_a.deref_mut() = 5;
    |      ^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> src/disjoint_mut.rs:370:28
    |
369 | #[test]
    | ------- in this procedural macro expansion
370 | fn count_von_disjoint_mut() {
    |                            ^
    = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 4 warnings emitted

error: test failed, to rerun pass `-p rav1d --lib`
MIRIFLAGS="-Zmiri-tree-borrows" cargo miri test --lib count --release
running 1 test
test src::disjoint_mut::count_von_disjoint_mut ... error: Undefined Behavior: reborrow through <182117> at alloc242492[0x0] is forbidden
   --> src/disjoint_mut.rs:163:9
    |
163 |         self.slice
    |         ^^^^^^^^^^ reborrow through <182117> at alloc242492[0x0] is forbidden
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Tree Borrows rules it violated are still experimental
    = help: the accessed tag <182117> is a child of the conflicting tag <182107>
    = help: the conflicting tag <182107> has state Disabled which forbids this reborrow (acting as a child read access)
help: the accessed tag <182117> was created here
   --> src/disjoint_mut.rs:377:23
    |
377 |     let mut ref_1_b = DisjointMut::index_mut(&disjoint_slice, 0);
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: the conflicting tag <182107> was created here, in the initial state Reserved
   --> src/disjoint_mut.rs:328:30
    |
328 |         let slice = unsafe { &mut *index.get_mut(self.as_mut_slice()) };
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: the conflicting tag <182107> later transitioned to Disabled due to a foreign write access at offsets [0x0..0x4]
   --> src/disjoint_mut.rs:379:5
    |
379 |     *ref_1_a.deref_mut() = 5;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^
    = help: this transition corresponds to a loss of read and write permissions
    = note: BACKTRACE (of the first span) on thread `src::disjoint_m`:
    = note: inside `<src::disjoint_mut::DisjointMutGuard<'_, std::vec::Vec<i32>, i32> as std::ops::DerefMut>::deref_mut` at src/disjoint_mut.rs:163:9: 163:19
note: inside `src::disjoint_mut::count_von_disjoint_mut`
   --> src/disjoint_mut.rs:381:6
    |
381 |     *ref_1_b.deref_mut() = 10;
    |      ^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> src/disjoint_mut.rs:370:28
    |
369 | #[test]
    | ------- in this procedural macro expansion
370 | fn count_von_disjoint_mut() {
    |                            ^
    = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 4 warnings emitted

error: test failed, to rerun pass `-p rav1d --lib`

The "safety preconditions" referenced on line 324 are exactly what a safe function CANNOT impose!

rav1d/src/disjoint_mut.rs

Lines 318 to 330 in 7d72409

pub fn index_mut<'a, I>(&'a self, index: I) -> DisjointMutGuard<'a, T, I::Output>
where
I: Into<Bounds> + Clone,
I: DisjointMutIndex<[<T as AsMutPtr>::Target]>,
{
let bounds = index.clone().into();
// SAFETY: The safety preconditions of `index` and `index_mut` imply
// that the indexed region we are mutably borrowing is not concurrently
// borrowed and will not be borrowed during the lifetime of the returned
// reference.
let slice = unsafe { &mut *index.get_mut(self.as_mut_slice()) };
DisjointMutGuard::new(self, slice, bounds)
}

A function is most definitely not safe if calling it twice with the same input (assuming it is possible to safely be called twice... this is the entire reason types that you can't clone exist, after all...) is possible and causes undefined behavior. While DisjointMut's API is currently not a public detail of the library, this function is nonetheless in fact an unsound function by the rules of Rust, because it should be marked unsafe, but isn't.

And there are plenty of ways to achieve unsound behavior from entirely internal code that goes undetected by testing but is actually exercised by real public usage of the API. I can link you to dozens of examples from other codebases that thought the "checked in debug, unchecked at runtime" was sufficient to establish safety on its own, if you like.

I mean, I think it's a rather nice type even if index_mut (and probably a few other fn) has to be unsafe! But in its current form, with them falsely marked as safe fn, prevents correctly linting against unsafe code. It bypasses such linting. Your codebase probably will not exceed compilation memory usage limits just by processing a few more unsafe tokens, will it?

There's plenty of weird holes to fall down, too, even with the debug assertions in play, because debug assertions firing isn't necessarily good enough if you put the panic in the wrong place. The panic should go before the safety condition is violated. e.g. this also fails cargo miri test --lib, and without --release or anything:

#[test]
fn less_cute_this_time() {
    use std::panic::*;
    let v = vec![1, 2, 3, 4, 5];
    let disjoint_slice = DisjointMut::new(v);
    let mut ref_1_a = DisjointMut::index_mut(&disjoint_slice, 0);
    let mut ref_1_b = catch_unwind(AssertUnwindSafe(|| DisjointMut::index_mut(&disjoint_slice, 0)) );
    *ref_1_a.deref_mut() = 5;
}

Hi! Thanks for going into detail on this, especially with the numerous examples!

We understand that DisjointMut's methods are unsound as a public API, and had originally marked them unsafe due to this. However, they are pub(crate), and so we have decided that rather than mark every usage of DisjointMut, and thus much of the codebase, as unsafe, we can altogether ensure that every usage of DisjointMut indexing within this crate is indeed disjoint and thus sound:

/// Since [`DisjointMut`] and [`disjoint_mut`](module@self) are only `pub(crate)`,
/// we can ensure all uses are disjoint and thus sound.

We don't think that individual safety comments on every index are easier to understand than a more centralized analysis of how indices and ranges are split up among tasks and threads, and thus that they would in fact be more unsafe, because the justification logic is much harder to understand.

Ultimately, we need a solution to this that is performant enough and as safe as possible. rav1d already depends on a ton of unsafe assembly, for performance, and until we can find a better solution that is safer and still performant, or use this performant but not quite sound solution, dav1d will be used instead, and that will be far less safe.

The panic should go before the safety condition is violated.

This is a great point that we didn't consider. We should be able to fix this.

Marked or not, from what you're saying, much of the codebase (namely, the parts that index_mut a DisjointMut) is unsafe. The whole point of Rust's safety model is that it allows the compiler to make very strong guarantees to the programmer about the behavior of safe code. Any unsoundness, even if it's localized within a crate, is enough to create a dangerous footgun for contributors who (reasonably) expect safe rust's soundness guarantees to completely protect their safe code from UB.

A contributor or user can easily ignore or otherwise miss a doc comment on a function. They can't ignore a compile error that tells them to explicitly opt-in to a potentially unsafe function (and encourages them to read that function's safety documentation to learn about its preconditions).

Aside from that, another option here is replacing all usages of DisjointMut with arrays/vecs of relaxed atomics (which would make this shared mutable buffer pattern completely thread safe, at least on the Rust side), although that would likely come at a performance penalty (purely speculation, probably worth a benchmark)

probably more than slight, I think.

Aside from that, another option here is replacing all usages of DisjointMut with arrays/vecs of relaxed atomics (which would make this shared mutable buffer pattern completely thread safe, at least on the Rust side), although that would likely come at a performance penalty (purely speculation, probably worth a benchmark)

We did consider this, and personally I did want to fully try it out and see how it's performance was.

The two major downsides to this that we saw were 1) performance as you said. While relaxed atomic load and stores are identical to a normal load and store for a single isolated instruction, they're not optimized the same. Two consecutive, 16-bit aligned AtomicU8 relaxed loads won't get combined into a single AtomicU16 relaxed load, even though the wider load is still atomic. And it's similar for wider, SIMD widths, which are often atomic, but have no Rust atomic type because they're not always atomic. I was in favor of attempting this, however, and seeing what the performance impact would be in practice.

And 2) assembly. All of these DisjointMut buffers are passed to assembly, and we'd have to walk sure they are used with relaxed atomic instructions in the assembly. I'm not sure exactly how to verify this in the assembly and what kinds of instructions technically are relaxed atomic, much less verify this for all of the assembly.

I strongly urge you to adjust the methods so that they are correctly defined as unsafe methods. Otherwise the codebase is comparatively hostile to outside contributors looking at the code, who won't see unsafe blocks around the method calls, and won't know there's rules they need to check to avoid UB.

We don't think that individual safety comments on every index are easier to understand than a more centralized analysis of how indices and ranges are split up among tasks and threads,

Then don't do that. Safety comments are suggested documentation, but they're still just documentation. If the unsafe usage is just // Safety: See type docs that's fine documentation. You don't need a full paragraph on each unsafe {.

I strongly urge you to adjust the methods so that they are correctly defined as unsafe methods. Otherwise the codebase is comparatively hostile to outside contributors looking at the code, who won't see unsafe blocks around the method calls, and won't know there's rules they need to check to avoid UB.

We don't think that individual safety comments on every index are easier to understand than a more centralized analysis of how indices and ranges are split up among tasks and threads,

Then don't do that. Safety comments are suggested documentation, but they're still just documentation. If the unsafe usage is just // Safety: See type docs that's fine documentation. You don't need a full paragraph on each unsafe {.

The only level at which you can guarantee that all calls are sound, and thus mark safe, is the very top level calls. So essentially the whole codebase becomes unsafe, and that doesn't seem more contributor friendly either. It would also hide other usages of unsafe, though I think there may be some clippy lints for this.

Using an unsafe block does not mean "this code is sound without looking anywhere else".

Each unsafe block is correct only when cooperating with all other unsafe blocks within the same API bubble. In this case perhaps that's "the entire API", if all parts of the code are all using DisjointBuffer. However, if that's the case, it's better to be just be honest about how big the bubble is. If everything that touches a DisjointBuffer needs to be careful, you're not really serving anyone by not telling them "be careful".

And I really don't understand what you mean by some unsafe would cover up other unsafe. You can use smaller blocks if separate spans have separate logic.