awnumar/memguard

bufferList Data Race Can Occur in Panic

jpaskhay opened this issue · 3 comments

Describe the bug
Running into a transient data race in our benchmark tests where the bufferList is being written to while another goroutine tries to read it in the Panic function. Looking at the code, the Panic function is the only read performed on the bufferList.list without a read lock.

To Reproduce
It seems to be a timing issue in our benchmark, so it's tricky to reliably reproduce. In our specific failure, It seems to occur as a result of an mlock request failing (resulting in Panic being called) at the same time as a write to bufferList from a Buffer.Destroy call. Here are the relevant snippets of the data race stacks:

WARNING: DATA RACE
Read at 0x00c000098798 by goroutine 51:
  github.com/awnumar/memguard/core.Panic()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/core/exit.go:65 +0x133
  github.com/awnumar/memguard/core.NewBuffer()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/core/buffer.go:75 +0x775
  github.com/awnumar/memguard.NewBuffer()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/buffer.go:47 +0x3c
  github.com/awnumar/memguard.NewBufferRandom()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/buffer.go:215 +0x3c

...

Previous write at 0x00c000098798 by goroutine 25:
  github.com/awnumar/memguard/core.(*bufferList).remove()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/core/buffer.go:236 +0x21b
  github.com/awnumar/memguard/core.(*Buffer).Destroy()
      /go/pkg/mod/github.com/awnumar/memguard@v0.19.1/core/buffer.go:178 +0x238

Expected behaviour
No data race warning. This may be relatively minor since even after a fix it would still panic, but could still be preferred to avoid the data race.

Screenshots
N/A

System (please complete the following information):

  • OS and Kernel Versions: Debian GNU/Linux 10 (buster) (using the golang:1.12 Docker image in our builds)
  • Memguard Version: v0.19.1

Additional context
Will submit PR for possible fix.

Thanks for reporting this.

In the panic function I specifically left out the acquisition of locks since I assumed when panic is called then we have no guarantees about the state of the application and don't want a condition where we deadlock if a dead procedure has the lock.

Looking back at the code, I think this can only happen when the caller has acquired the lock to either the buffer list or the coffer and panics instead of returning. Also we should probably acquire the coffer lock so that we don't have a condition where panic is called during a re-key operation.

Going to attempt to write a unit test to trigger this race.

Pushed a patch in #127, would welcome your thoughts on it. It should fix the race condition above.

Pushed a patch in #127, would welcome your thoughts on it. It should fix the race condition above.

Cool. Should be able to circle back in the next day or so.