lukechampine/us

BoltMetaDB.ForEachBlob can cause a deadlock

jkawamoto opened this issue · 9 comments

If the callback function passed to ForEachBlob calls another DB method such as DeleteBlob and Metadata, if I'm not mistaken, those methods try to acquire the lock and cause a deadlock.

Even if the callback doesn't try to update the DB, a deadlock can happen:

If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock.

See also: https://golang.org/pkg/sync/#RWMutex.

A deadlock should only occur if fn modifies the database. I've confirmed locally that this does not deadlock:

db.AddBlob(DBBlob{Key: []byte("foo")})
db.AddBlob(DBBlob{Key: []byte("bar")})
db.AddBlob(DBBlob{Key: []byte("baz")})
db.ForEachBlob(func(k []byte) error {
	b, _ := db.Blob(k)
	fmt.Println(string(b.Key))
	return nil
})

It looks like your implementation (kvsMetaDB) was wrapping the db calls in a separate mutex, which is (usually) unnecessary -- the database has its own internal mutex.

I'll update the docstring to reflect that fn must not modify the db.

As I mentioned, we update the DB from ForEachBlob and get a deadlock. Even if fn doesn't try to update the DB, a deadlock can happen. See the manual. So, we need to add another lock as a workaround.

Ok, but that's a consequence of your implementation. There's not really anything that I can do about it.

No. Unfortunately, this is an API problem. For example, in this code

db.ForEachBlob(func(k []byte) error {
	b, _ := db.Blob(k)
	fmt.Println(string(b.Key))
	return nil
})

both ForEachBlob and Blob call bdb.View, but bbolt DB doesn't allow it:

Transactions should not depend on one another and generally shouldn't be opened simultaneously in the same goroutine. This can cause a deadlock as the read-write transaction needs to periodically re-map the data file but it cannot do so while any read-only transaction is open. Even a nested read-only transaction can cause a deadlock, as the child transaction can block the parent transaction from releasing its resources.

(https://github.com/etcd-io/bbolt#transactions)

ForEachBlob needs to pass a transaction object to the callback function, and in the callback, any requests have to be via the transaction object.

You can try this:

db.AddBlob(DBBlob{Key: []byte("foo")})
db.AddBlob(DBBlob{Key: []byte("bar")})

var wg sync.WaitGroup
wg.Add(1)
go func() {
	defer wg.Done()
	db.ForEachBlob(func(k []byte) error {
		<-time.After(3 * time.Second)
		b, _ := db.Blob(k)
		fmt.Println(string(b.Key))
		return nil
	})
}()

<-time.After(1 * time.Second)
db.AddBlob(DBBlob{Key: []byte("baz")})
wg.Wait()

Ah, I see. I was confused because earlier you linked to the sync.Mutex documentation. You are correct, my apologies.

I wonder -- do we need the ForEachBlob method to be part of the interface? It doesn't appear to be used anywhere. Perhaps it would be best to simply remove it for now. Since your kvsMetaDB has direct access to the DB, you should be able to implement any functionality you need directly.

I should have shown bbolt's documents first...

Our kvsMetaDB is a wrapper of renterutil.MetaDB and cannot access BoltMetaDB.bdb since it's private. That means kvsMetaDB only has access to the DB via functions renterutil.MetaDB provides. In our use case, especially when sharing buckets, ForEachBlob is an important function because uses don't know what objects are stored in the shared buckets.

I see. Ok, I can think of three options:

  1. You can copy the BoltMetaDB code entirely, giving you full control over the implementation. The downside is that your implementation will be decoupled from renterutil.BoltMetaDB.
  2. BoltMetaDB could export its bdb field. The downside is that implementation details would be exposed.
  3. The MetaDB interface could be overhauled to use a transaction-based API. The downside is that it would be a lot of work and using the API would become more verbose/awkward.

I think option 1 is the most practical at this time, but in the long run I might have to go with option 3.

I think (1) and (3) go the same way. The difference is who implements it. 😅 Currently, we use another lock to prevent the deadlock, and we’re fine to wait for the new API.