onyx-platform/onyx

Peer join race condition

lbradstreet opened this issue · 2 comments

Found by Jepsen.

What I know about this issue:

I initially thought this was due to the failure monitor, but no leave-clusters are sent out after a certain point. Since peers that see leave-cluster just suicide and rejoin, I no longer think it's related. That said, I did see a couple of exceptions in the failure monitor, and maybe peers are still getting deadlocked as a result of these exceptions.

Tracking pending patch in #454.

I believe this has been fixed by #484, but I would not be surprised to see it pop up again. I will continue the Jepsening, and close for now.