dgraph-io/badger

Deadlock when updating keys while iterating over them

jogramming opened this issue · 7 comments

I don't see it stated anywhere that updating values while iterating isn't supported, so i'm assuming i'm not doing anything wrong?

Using the latest git version as of typing.

Running the following for a while (usually 10 seconds for me) will cause a deadlock (when the counter stays at 0 every second):

package main

import (
	"fmt"
	"github.com/dgraph-io/badger"
	"math/rand"
	"os"
	"os/signal"
	"sync/atomic"
	"syscall"
	"time"
)

var (
	DB          *badger.DB
	updateCount = new(int64)
)

func panicErr(err error) {
	if err != nil {
		panic(err)
	}
}

func main() {
	opts := badger.DefaultOptions
	opts.Dir = "db"
	opts.ValueDir = "db"
	opts.SyncWrites = false

	var err error
	DB, err = badger.Open(opts)
	panicErr(err)

	Load(0, 0xffff)
	go ReaderWriter()

	sc := make(chan os.Signal, 1)
	signal.Notify(sc, syscall.SIGINT, syscall.SIGTERM, os.Interrupt, os.Kill)

	ticker := time.NewTicker(time.Second)
	for {
		select {
		case <-sc:
			DB.Close()
			return
		case <-ticker.C:
			numRead := atomic.SwapInt64(updateCount, 0)
			fmt.Println("Read: ", numRead)
		}
	}
}

const vauleSize = 0xff

func Load(startOffset int64, amount int64) {

	d := make([]byte, vauleSize)

	for i := int64(0); i < amount; i++ {
		panicErr(DB.Update(func(txn *badger.Txn) error {
			rand.Read(d)
			return txn.Set([]byte(fmt.Sprint(i+startOffset)), d)
		}))
	}
}

func ReaderWriter() {
	for {
		// Iterate over the db reading and upadting values
		err := DB.Update(func(txn *badger.Txn) error {

			opts := badger.DefaultIteratorOptions
			opts.PrefetchValues = false
			it := txn.NewIterator(opts)
			defer it.Close()
			i := 0
			for it.Rewind(); it.Valid(); it.Next() {

				i++
				if i >= 100000 {
					break
				}

				atomic.AddInt64(updateCount, 1)

				item := it.Item()
				key := item.Key()

				newValue := make([]byte, vauleSize)
				rand.Read(newValue)
				item.Value()
				err := txn.Set(key, newValue)
				if err != nil {
					return err
				}
			}

			return nil
		})
		panicErr(err)
	}
}

I believe the issue seems to be the callback, that releases the rlock on the logFile, put into the transaction callbacks here https://github.com/dgraph-io/badger/blob/master/iterator.go#L84 is not called before calling batchSet here https://github.com/dgraph-io/badger/blob/master/transaction.go#L411 which sometimes asks for the lock again here https://github.com/dgraph-io/badger/blob/master/value.go#L130

You have an item.Value() in there. This returns a value, acquiring read locks on the value log to avoid value log GC causing the file to be removed. What's happening is that the writes in this long-running iteration are still being written out to the value log, which gets filled up and needs to be replaced, but that isn't possible given the read locks.

You could avoid this by two things:

  1. Set PrefetchValues to true. This would read the value, copy it over and release the read lock.
  2. Do a pure key-only iteration, without calling Value().
  3. Store the keys that you need to write in a list, and then write them in a separate transaction after the iteration is over.

I'm seeing users getting stuck with this issue repeatedly. So, it would make sense to have another function on Item, where a user can pass in a byte slice which can be written to; thus the read lock can be released immediately.

We also need to have a warning about this behavior in the README.

I'm also seeing this issue occuring outside of iterators, rarely (but not so rarely in my use case) calling Value() on Item and then updating it in the same transaction, I'm a bit too tired to make a minimal reproducable case right now but i can get to it in the morning probably.

This pending PR would add a ValueCopy function, which can be used to avoid deadlocks in long-running updating iterations.

Update: @jonas747 Can you run with my PR and check if you still see any deadlocks using the new ValueCopy function?

It dosen't seem to happen anymore as far as i can see after using ValueCopy, but if i can make a request, it would be nice if we could pass our own say io.Writer so we could reuse that buffer.

Just pushed another commit, which modifies the function to accept a byte slice, where the data would be written.

The commit is in b3568eb. @deepakjois would cut a release branch sometime soon. Documentation is updated. Marking this issue as closed.