rust-lang/rust

Data race in thread::scope

RalfJung opened this issue · 4 comments

The following code sometimes causes a data race to be reported by Miri:

#![feature(atomic_from_mut, inline_const)]
use std::sync::atomic::{AtomicBool, Ordering};

fn main() {
let mut some_bools = [const { AtomicBool::new(false) }; 10];

let view: &mut [bool] = AtomicBool::get_mut_slice(&mut some_bools);
assert_eq!(view, [false; 10]);
view[..5].copy_from_slice(&[true; 5]);

std::thread::scope(|s| {
    for t in &some_bools[..5] {
        s.spawn(move || assert_eq!(t.load(Ordering::Relaxed), true));
    }

    for f in &some_bools[5..] {
        s.spawn(move || assert_eq!(f.load(Ordering::Relaxed), false));
    }
});
}

Miri reports:

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/scoped.rs:133:17
    |
133 |       let scope = Scope {
    |  _________________^
134 | |         data: ScopeData {
135 | |             num_running_threads: AtomicUsize::new(0),
136 | |             main_thread: current(),
...   |
140 | |         scope: PhantomData,
141 | |     };
    | |_____^ created allocation with id 1082
    |
    = note: inside `std::thread::scope::<[closure@atomic.rs:12:20: 20:2], ()>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/scoped.rs:133:17
note: inside `main` at atomic.rs:12:1
   --> atomic.rs:12:1
    |
12  | / std::thread::scope(|s| {
13  | |     for t in &some_bools[..5] {
14  | |         s.spawn(move || assert_eq!(t.load(Ordering::Relaxed), true));
15  | |     }
...   |
19  | |     }
20  | | });
    | |__^

error: Undefined Behavior: Data race detected between Deallocate on Thread(id = 0, name = "main") and Read on Thread(id = 2) at alloc1082+0x8 (current vector clock = VClock([114, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]), conflicting timestamp = VClock([113, 0, 9, 9, 9, 0, 0, 0, 0, 0, 9]))
  --> atomic.rs:12:1
   |
12 | / std::thread::scope(|s| {
13 | |     for t in &some_bools[..5] {
14 | |         s.spawn(move || assert_eq!(t.load(Ordering::Relaxed), true));
15 | |     }
...  |
19 | |     }
20 | | });
   | |__^ Data race detected between Deallocate on Thread(id = 0, name = "main") and Read on Thread(id = 2) at alloc1082+0x8 (current vector clock = VClock([114, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]), conflicting timestamp = VClock([113, 0, 9, 9, 9, 0, 0, 0, 0, 0, 9]))
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

   = note: inside `main` at atomic.rs:12:1

The deallocation occurs when scope returns and its local variable scope gets deallocated.

The read occurs here (I am fairly sure):

if self.num_running_threads.fetch_sub(1, Ordering::Release) == 1 {
self.main_thread.unpark();

This reads the contents of the field self.main_thread after the fetch_sub. But the moment the fetch_sub is done, the memory backing self might be deallocated. That makes the read a potential use-after-free, and a race with the deallocation.

Cc @m-ou-se

The following patch fixes this:

diff --git a/library/std/src/thread/scoped.rs b/library/std/src/thread/scoped.rs
index a387a09dc8b..06019cbc75b 100644
--- a/library/std/src/thread/scoped.rs
+++ b/library/std/src/thread/scoped.rs
@@ -55,8 +55,10 @@ pub(super) fn decrement_num_running_threads(&self, panic: bool) {
         if panic {
             self.a_thread_panicked.store(true, Ordering::Relaxed);
         }
+        // The moment we do the `fetch_sub`, the memory we ourselves are stored in can be deallocated.
+        let main_thread = self.main_thread.clone();
         if self.num_running_threads.fetch_sub(1, Ordering::Release) == 1 {
-            self.main_thread.unpark();
+            main_thread.unpark();
         }
     }
 }

However, even with this patch, we still run into a problem like #55005: the &self gets deallocated while decrement_num_running_threads still runs. This is still UB even after my proposed spec change because self contains a field that is not inside an UnsafeCell (the main_thread field). So that part of what &self points to is expected to stay live for the entire duration of the function call.

Another alternative might be to put the entire Scope into an Arc rather than allocating it on the stack frame of scope.

Going to attach this to the 1.63 milestone, since this is in a newly stabilized feature -- we may wish to back out stabilization for a cycle or backport fixes.

Re-opening for beta backport.

The fix was backported in #98949.