gfx-rs/wgpu

Lock analysis is not prepared to handle SnatchLock usage

jimblandy opened this issue · 4 comments

In #5475, we worked around a deadlock by passing a SnatchGuard by value to maintain. However, this means that SnatchGuards and other kinds of lock guards are not acquired and dropped in a stack-like fashion, which is an assumption made by the locking analysis in lock::ranked.

Specifically, Global::queue_submit acquires a SnatchGuard on the device, and then acquires a read guard on Device::fence. However, the SnatchGuard is passed by value to Device::maintain, which drops it, while the read guard goes out of scope in queue_submit after maintain has returned.

Thus, the SnatchGuard is both acquired first and dropped first, which is a violation of the stacking order, which the lock rank checker assumes.

Kudos to @ErichDonGubler for noting in the review of #5539 that the checker assumes stack-like acquisition and dropping, and suggesting that we actually check that this is the case.

It does seem that SnatchGuard is the only thing that is not handled in a stack-like fashion.

The analysis could be changed to handle arbitrary drop order.

  • Require that the ranks in the define_lock_ranks! macro call appear in order of increasing rank. (At the moment, ranks can appear in any order, but it would probably be clearer to have them in order anyway.)
  • Let the per-thread state be:
    • a LockRankSet in which bits are set and cleared as locks of the given rank are acquired and released, and
    • an array of Option<Location> values indexed by rank bit number, indicating where each currently held lock was acquired.
  • Acquisition applies u64::ilog2 to the LockRankSet to determine the highest-ranked lock currently held, and look that up in a global table of ranks built by the define_lock_ranks! macro to see which new locks are allowed to be acquired.
  • Lock guards track only their lock's rank bit, so that they can clear it when they're dropped.

This preserves the property that threads only acquire locks listed as a follower of the highest-ranked lock currently held, so the chain of threads blocked on other threads always follows a path of locks of increasing rank, which is the property that ensures there is an unblocked thread, holding the lock of highest rank.

If you squint, the current implementation maintains a stack of states, where each lock guard holds the state to be restored when it is dropped. This replaces that scattered stack with a single bitmask, and uses ilog2 to scan the bitmask to find the maximum-ranked held lock to validate an acquisition.

But it might be easier to just change the code to remain stack-like.

But it might be easier to just change the code to remain stack-like.

I agree we should try to get back to stack like.