Race in Cluster.onConnect that leaves cluster in shutdown state indefinitely?

Question

Race in Cluster.onConnect that leaves cluster in shutdown state indefinitely?

jdanbrown opened this issue 12 years ago · 3 comments

I don't have hard evidence for this, but I think the following is possible:

Receive SyncConnected for sessionid "a"
Call onConnect()
zk session expires here, sessionid was "a"
Call joinCluster()
zk.get().getSessionId establishes new session with sessionid "b" and returns "b"
/<name>/nodes/<nodeID> created with connectionID "b", lifetime bound to sessionid "b"
Complete joinCluster(), onConnect(), SyncConnected
Receive SyncConnected for sessionid "b"
Call onConnect()
Call previousZKSessionStillActive()
previousZKSessionStillActive() uses session "b" to read /<name>/nodes/<nodeID>, finds connectionID "b", returns true
onConnect() returns early, skipping cluster setup

And somewhere in there an Expired event for sessionid "a" shuts down the cluster (I'm not sure if it matters when), leaving the cluster in a shutdown state indefinitely because onConnect() for sessionid "b" was tricked into skipping cluster setup.

Answer 1 · 2012-09-21T21:45:58.000Z

I can now trip this somewhat reliably in my own code (unpublished) by starting a handful of nodes (~5-10) and putting them through a short series of brief zk connection losses.

I think the root problem is that the code sometimes uses separate zk.get() calls, multiple times in sequence, while assuming that all interactions are in the same zk session, which is unsound because each zk.get() call can potentially start a new zk session.

Here's one simple example of where this problem can occur:

Cluster.joinCluster calls zk.get().getSessionId to figure out what sessionId to put in its ephemeral /nodes/<nodeId>
ZKUtils.createEphemeral calls zk.get().create to create the ephemeral node with the requested data

In the case where the two separate zk.get() calls return ZooKeeper instances for different sessions, we end up with a /nodes/<nodeId> actually created by session B (visible by stat) that thinks it was created by session A (visible by get). I think this particular race wouldn't cause the failure outlined above, but there's a zk.get() race at the core of that one too.

Answer 2 · 2012-09-21T22:21:51.000Z

As for a fix, I'm imaging something along these lines:

Indirect all existing calls to zk.get() through a call to currentZk()
currentZk() calls zk.get() except when there's a fixed ZooKeeper instance to tie us to a single session, which we can unobtrusively communicate through a dynamic variable fixedZk:

def currentZk(): ZooKeeper = fixedZk.value getOrElse zk.get()
val fixedZk = new DynamicVariable[Option[ZooKeeper]](None) // No fixed zk session by default

To fix a zk session for a particular block of code, wrap the block with zkSingleSession { ... }, which we can define as:

def zkSingleSession[X](x: => X): X = fixedZk.withValue(zk.get()) { x }

Then any zkSingleSession { body } is guaranteed to observe only one ZooKeeper instance, and thus only one session, and if it ever tries to use a ZooKeeper whose session has expired, it will trigger a SessionExpiredException that body probably can't meaningfully handle, in which case it will abort body, and the caller of zkSingleSession will be responsible for deciding whether to fail or retry.

Answer 3 · 2012-09-25T23:07:13.000Z

This bug seems to amplify a ginormous design flaw in Twitter's Zookeeper API, that is, creating a method that do two things (retrieve or create and retrieve) instead of one. I proposed a fix some time ago, but job commitments got in my way. Your bug pitch shows this question should be addressed as soon as possible! Maybe we can start at ZKClient and move up the stack?