szferi/gomdb

Ineffective thread locking.

Opened this issue · 4 comments

The usage of runtime.LockOSThread() and runtime.UnlockOSThread() seems incorrect to me.

Child transactions will unlock their parents on Commit or Abort. Dealing with this is fairly trivial except maybe in the case of Cursor.Txn, where it seem to get only slightly more tricky.

The bigger problem in my opinion seems to be that there's nothing preventing usage of the transaction, or its cursors, inside another goroutine which, due to thread locking, is guaranteed by the Go runtime to execute on another OS thread.

I see two ways to deal with these problems.

  • Fix the problem with child transactions and place warning in godoc stating that write transactions (those without mdb.RDONLY) absolutely must not be used in any goroutine other than the one that created it. (edit: pull request #31)
  • Abstract this problem away by creating a new, locked goroutine in Env.BeginTxn and serialize all operations through it via channels. This would include serializing cursor creation and use through the same goroutine. Because MDB_NOTLS is used this would only be required for write transactions.

The latter option will introduce some overhead into write transactions. But I'm starting to think it's the only sane way to use MDB in Go.

Do you have any thoughts on this @szferi? I've spent a good amount of time looking into this but wouldn't be surprised if I've misinterpreted the rather spartan C docs. So please validate my conclusions yourself.

I opened #31 to make nested transactions safer and warn about unsafe behavior as I described in the first bullet above.

I think the next think to do is write more benchmarks so that any change made to guarantee thread safety can have its performance overhead quantified.

I'll try and write a couple benchmarks. But I don't want to stop anyone else 😉 It would be good for benchmarks to be as diverse in their use cases as possible. 😄

I'm getting looking into this problem. Can you attach actual test cases when the old code should generate unintended behavior and the #31 should fix that? (I know hard to come up good test cases for such a cases but at least we have to try)

I did attempt to put together a multi-threaded example which could produce 'unexpected' results. I was not able to do so in the ~hour I gave myself.

But as you said it is difficult to test. The C docs mandate serialization of write transactions to a single thread. They do not specify the effect when the integrating code does not obey the mandate. That omission doesn't imply it doesn't "work" in some sense (MDB_NOTLS?). It simply means you shouldn't be surprised if things go horribly wrong for you.

There is one somewhat reasonable test I can think of. On Linux the "syscall" package provides a mechanism for inspecting a goroutine's current thread identifier (an integer). So it wouldn't be hard to write a single platform test.

  • start long running concurrent stuff
  • open a write transaction
  • check thread id
  • open and close a nested transaction
  • check thread id
  • runtime.Gosched/time.Sleep... let other goroutines be scheduled
  • check thread id
  • close write transaction
  • (edit: kill) concurrent goroutines

The thread id should be the same at each call. That's the only assertion other from error handling.

Afaik the system call is somewhat different on other unixes. It's not provided in the syscall package on OS X. So you need to use a bulid constraint in the test file

// +build: linux

or something like that

(sorry for the comment spam yesterday)

FWIW, I've forked this repo and made a few changes. There is now a server (actor model) which enforces the one-writer many-readers requirements of LMDB. Use of runtime.LockOSThread means I now don't need the NOTLS at all.

There are some casualties - eg no nested transactions; and various other parts such as cursors I've not yet implemented. Best place to start is the examples in the server_test: https://github.com/msackman/gomdb/blob/master/server_test.go Comments welcome.