Watching the server lock differently
Closed this issue · 4 comments
The half-dead TabletServer problem occurs when the TabletServer's ZooKeeper connection has timed out and the ephemeral lock in the ZooKeeper server has been removed, but the TabletServer's ZooKeeper Watcher has not been notified. When the lock watcher is notified, the TabletServer halts itself. When the lock watcher is not notified, the TabletServer may continue to try to write mutations for tablets that may be hosted somewhere else because the Manager has noticed that the ZooKeeper lock is missing and re-assigned the tablets.
Looking in the ZooKeeper client code, Watcher notifiications occur in ClientCnxn.EventThread.run, specifically in the call to processEvent
. There is one EventThread created in the ClientCnxn object, so if the Watcher callback method takes a long time or is stuck, then the lock loss notification may not occur in a timely fashion.
One solution might be to create a new Thread in the server processes that uses a different ZooKeeper client object specifically to watch the server lock. This Thread could call ZooKeeper.exists(path, false)
periodically to check that that lock node exists without using a Watcher. This thread could run at a higher priority so that it has a higher probability of getting cpu time when under load, but use Thread.sleep
between iterations so that it's yielding for some amount of time for other threads.
Would be good if the tsever did a check of its lock after experiencing an error writing to its walog. Currently the following can happen.
- Manager revokes lease on a tservers walog when it determines a tserver is dead.
- If the tserver is still alive this will cause it to have a write error when writing to its walog.
- When the tserver has a write error it jsut creates a new walog.
Would be nice to have the tserver check its lock between steps 2 and 3 above. That could be done by calling the same code that is used for periodic check.
An alternative to a dedicated thread that periodically checks is to have a dedicated zookeeper object for lock acquisition. Not sure if this is better or worse overall. It would create an extra zookeeper connection per server, but would avoid the constant pings to zookeeper allowing Accumulo to rely on the zookeeper notification mechanisms.
This periodic check may not be needed on all server type. Its the most important on tablet servers and the manager. Compactors and scan servers could forgo this check in order to avoid introducing extra load on zookeeper.
This is an example of code that does a lot of stuff when processing a zookeeper event. This code reads from zookeeper when processing a zookeeper event. This type of activity could tie up the single thread and prevent processing other events like the one for lost lock. Posting this as an example of why this change could be useful.
It would create an extra zookeeper connection per server, but would avoid the constant pings to zookeeper allowing Accumulo to rely on the zookeeper notification mechanisms.
I think that we are already creating more ZK client objects than we need to anyway, which is something I'm addressing as part of #5124, so it may not be so bad to have the separate one just for this purpose.