crossbeam-rs/rfcs

Guards can be sound

Opened this issue · 15 comments

In the Atomic API RFC, I argued that we should model pinning using scopes rather than guards, for three reasons:

  1. Guards are tricky to implement correctly (in a sound manner).
  2. Guards were a tiny bit slower in my microbenchmarks.
  3. Guards are less explicit and thus easier to misuse (by e.g. accidentally pinning for too long).

This time I'd just like to show that the first point is invalid - it's actually pretty easy to make guards sound.

Here's a quick refresher. The problem was the following: if you pin the current thread and obtain a guard, you can keep having the guard for a very long time, even after the thread-local Handle gets destructed. Using it after that would be unsafe.

And here's the solution... Currently, when traversing the linked list of registered threads, we remove a node if its next pointer is tagged with 1. We should just have an additional check: remove it if the tag is 1 and if the thread is unpinned. That's all we need to change.

cc @hniksic Guards would make the following possible (mentioned here):

pub get_ref(&'a self, key: K) -> Option<ValueGuard<'a, V>>;

@jeehoonkang

Would the guards model help you in implementing QSBR-style reclamation, as you described in a previous comment?

If a guard can exist after the corresponding handle is dropped, it'll be usable in helping!

In order to make sure that I understood your solution correctly, I'd like to ask few questions. I think when a handle is dropped, the underlying local bag is also dropped. So each guard should have its own bag. That means all the guards derived from a single handle share only the local epoch in the registry. Am I understanding correctly?

Also, we need to somehow track the number of (guards + pins) for each handle. We can store it in the registry or we can manage it in an Rc shared by a handle and guards. Is it correct?

EDIT: maybe a handle and guards can share a Rc<Bag>..

I implemented Guard in an experimental branch: https://github.com/jeehoonkang/crossbeam-epoch/tree/guard

I changed Local in internal.rs and Handle in collector.rs. Now a local has more data than before, and a handle is just a thin wrapper around Rc<Local>. When a guard is created from a handle, the Rc<Local> is cloned.

A caveat is that in order to access Global from a Handle, we need two pointer dereferences: first to Local, and then to Global. Maybe we need to optimize it a little bit more.

Please have a look!

If we don't need to get a guard from a handle, then the implementation can be much simpler: https://github.com/jeehoonkang/crossbeam-epoch/tree/guard-from-collector Note that it's enough for helping. In this branch, a guard can be created directly from a collector. Each guard or handle created from a collector is registered in the collector.

It's probably possible to provide Guards, but the question is, do we really need them? With the addition of safepoints we can effectively have quiescent periods, but what does adding Guards achieve that was not possible or otherwise difficult to do before? If there exists an important use case for them, maybe they're a good idea, but otherwise I'm unsure whether adding additional complexity to the system is worth it.

EDIT: Oh right, I missed that they allow returning references from concurrent structures, but I'd say that it's not necessarily a good idea, since such a Guard keeps a thread pinned and it's up to the user to drop it quickly rather than extend the critical section indefinitely. On the other hand, something like with_value<FnOnce(&Val) -> R>(f: F) -> R makes the scope more explicit, which is much nicer in my opinion.

@Vtec234 I described one of guard's use case on wait-free construction (helping): #18 (comment)

I agree that Guard may introduce very long latency, and it should be regarded as an API for experts.

It's probably possible to provide Guards, but the question is, do we really need them?

Pin guard and scope can both be used incorrectly. The reason I proposed to keep the guard-based pinning API is that it is more expressive. At the most general level, the evidence is that scope can be implemented in terms of guard, but not vice versa. But what is really relevant is whether the difference affects practical usage.

The pin-with-guard approach allows the guard to be moved into a structure and returned from the function that did the pinning. This could come useful to data structures that want to provide efficient data access interfaces while hiding the implementation. Consider for example a lock-free container that provides temporary read-only access to the contained value. With the guard API, it is possible to provide a method like:

fn peek<'a>(&'a self) -> Read<'a, T>;

(Full implementation in the playground.)

peek has several nice properties:

  • It is reasonably efficient, as it ensures that the thread is pinned only once per peek and not re-pinned for every deref. The downside is that the caller must understand that holding on to the returned Read object will delay the GC.
  • It is ergonomic and feels natural - the Read wrapper works exactly like similar wrappers returned by other Rust APIs that implement deref such as RefCell::borrow.
  • It completely hides the use of Crossbeam, making epoch-based reclamation an implementation detail.

With scope-based pinning, achieving the same performance would require peek to switch signature to one that accepts a thunk invoked with the reference (or again an object that implements Deref). This is less elegant to use and document, but more importantly it disallows the usage where the call to peek and the access to the data are in sibling functions. For example, imagine a library that invokes FnOnce callbacks A, B, and C in the same thread in quick succession:

let a = A();
... quick work ...
let b = B(a);
... quick work ...
let c = C(b);
... continue normal work ...

With a container that provides peek it is possible for an implementation of A to peek into the container and pass the Read guard to B and then to C, which would finally drop it. (A cannot extract the needed data because it doesn't know which part of the referenced object B and C will use.) A container based on the scope API could not support this - a peek initiated inside A would have to receive a thunk and therefore end before A() completes. B and C would be forced to peek themselves, which would pin the thread more times than necessary.

Even if we made A, B, and C methods on the same object, it would still not work. Unless the constructor itself receives a scope (which requires the caller to be aware of Crossbeam/epoch reclamation), there is nothing it can do to provide the object's methods access to the scope.

I can understand that this can be construed as a feature, since it makes it harder to keep the thread pinned by passing the guard to potentially foreign code, or by simply dropping it. But there is nothing that stops a carelessly written thunk passed to (and invoked by) by coco::epoch::scope from blocking indefinitely, so safety is not guaranteed.

(Edit: this comment was written before I saw Vtec234's edit, which basically describes the same use case more briefly. Leaving the comment, as the longer description might still be useful.)

@jeehoonkang

So each guard should have its own bag. That means all the guards derived from a single handle share only the local epoch in the registry. Am I understanding correctly?

Hmm, not quite. All guards derived from a single handle share the Local, and this Local contains the bag. It's not much different from scopes.

Also, we need to somehow track the number of (guards + pins) for each handle. We can store it in the registry or we can manage it in an Rc shared by a handle and guards. Is it correct?

Yeah, we must track the number of pins, like you did with pin_depth.

I implemented Guard in an experimental branch: https://github.com/jeehoonkang/crossbeam-epoch/tree/guard

This looks decent, but we can avoid reference counting. :)

If we don't need to get a guard from a handle, then the implementation can be much simpler: https://github.com/jeehoonkang/crossbeam-epoch/tree/guard-from-collector

I think pinning would be too slow here because each guard contains an Arc<Global>. Atomic reference counting is already kinda slow, but now all threads contend on the single refcount whenever a guard is created/dropped.

I have a branch that completely replaces scopes with guards. While I haven't benchmarked anything, performance should be comparable to what it was before. Also, the code is not super clean, but you'll get the idea.

@Vtec234

If there exists an important use case for them, maybe they're a good idea, but otherwise, I'm unsure whether adding additional complexity to the system is worth it.

To be clear, I don't think we should have both scopes and guards, but only one of those. Note that guards are strictly more powerful than scopes since you can implement scopes using guards:

struct Scope(Guard);

fn pin_scope<R, F: FnOnce(&Scope) -> R>(&self, f: F) -> R {
    let guard = pin_guard();
    let scope = Scope(guard);
    f(&scope)
}

You can't implement guards using scopes in a similar manner.

With the addition of safepoints we can effectively have quiescent periods, but what does adding Guards achieve that was not possible or otherwise difficult to do before?

The benefits of guards are twofold.

First, ergonomics. Compare the following:

epoch::pin(|scope| {
    // ...
});

unsafe {
    epoch::unprotected(|scope| {
        // ...
    })
}

let guard = &epoch::pin();
// ...

let guard = unsafe { &epoch::unprotected() };
// ...

Second, flexibility. @jeehoonkang and @hniksic have expressed a desire for returning references/iterators/whatever that keep the current thread pinned for their entire lifetime. This is simply not possible with scopes.

On the other hand, something like with_value<FnOnce(&Val) -> R>(f: F) -> R makes the scope more explicit, which is much nicer in my opinion.

You could make the exact same argument about Mutexes or RefCells.

// Why have guards...
let guard = mutex.lock();
let val = refcell.borrow_mut();

// ...instead of scopes? Scopes are more explicit.
mutex.lock(|guard| {
    // ...
});
refcell.borrow_mut(|val| {
    // ...
});

But I actually agree with you here. The explicit with_value method seems like a good idea. And guards must never be silently hidden within returned data. Whenever a piece of data is keeping the current thread pinned, that must somehow be perfectly clear.

But, in the end, whether a data structure API is modeled using with_value(|| ...) or get_value() is a matter of tradeoffs. I don't think one of those two is always the correct answer, but with guards you can surely build both methods. With scopes you only get the first one.

One more thing. If we switch to guards, this will affect collectors and handles a little bit:

let c = Collector::new();
let h = c.handle();

let guard = h.pin();

thread::spawn(move || {
    // This is wrong. Now the same handle has guards in two different threads.
    let guard = h.pin();
})

We'll have to make Handles non-Send. In order to create a handle, one would access the Collector from the thread that needs to be registered and then call .handle().

Just as a watcher-from-the-shadows - mem::forget is a not a safety issue with guards, right? Worst that can happen is the collector gets suspended indefinitely?

Just as a watcher-from-the-shadows - mem::forget is a not a safety issue with guards, right? Worst that can happen is the collector gets suspended indefinitely?

Exactly. Memory gets leaked, but it's not a safety issue.

@stjepang I like your branch, and want it to be merged.

I have a minor comment on List in you branch. I noticed that List's iteration/insert API is no longer used, e.g.: crossbeam-rs/crossbeam-epoch@master...stjepang:guard#diff-6d7fb2f2e4b00caaa3efe3c37b5708e9R42 . What aspects of list API aren't fit into here?

What aspects of list API aren't fit into here?

The reason is that I need to perform a specific action whenever a node is unlinked:

unsafe {
    global.push_bag(&mut *c.local.bag.get(), guard);
    global.collect(guard);
    guard.defer(move || curr.into_owned())
}

We can probably support that in the List<T> interface somehow, but for quick demonstration purposes, I didn't bother and just implemented an ad-hoc linked list.

Thanks all for the detailed explanations, I no longer object to this change. It's slightly amusing that we're essentially going back to the original design of crossbeam now, but if it's more flexible and can be made sound, that's fine.

I wonder whether with this API safepoint() could still be safe - if we bind the lifetimes of Ptrs obtained from a Guard to that guard's lifetime, i.e. just make them borrow the guard and make the signature of safepoint fn safepoint(self) -> Guard, that should still prevent using previously borrowed Ptrs, which could've been from the previous epoch.

Nice, when we accepted the first rfc I had the felling the guard would eventually come up again to allow stuff like this https://github.com/arthurprs/burst-trie/blob/thread_safe/src/map.rs#L42 (code looks like crap, it was an experiment on top of my first rust project)

We'll have to make Handles non-Send. In order to create a handle, one would access the Collector from the thread that needs to be registered and then call .handle().

That is probably ok.