scalar-labs/scalar-jepsen

The final read phase failed after terminate-nemesis

Closed this issue · 0 comments

What happened

Sometimes, after the nemesis requested the C* crash, the final read phase failed because some nodes were still down.

Cause

The cause is that terminate-nemesis is not enough to recover the cluster. It just requests an operation to stop the existing failure injection. It doesn't wait for recovery. That's why the final read phase starts before the cluster was recovered.

Solution

We need to wait for the cluster recovery. We can add the wait function just before the read because terminate-nemesis has started to recover the cluster.