Raft specification leader staleness

Question

Raft specification leader staleness

spiliopoulos opened this issue 8 years ago · 6 comments

Hi,

I was going through the Raft spec and the following lines in the handleAppendEntriesResponse looked to me to be inverted

https://github.com/uwplse/verdi-raft/blob/master/raft/Raft.v#L244-L249

 Definition handleAppendEntriesReply (me : name) state src term entries (result : bool)
  : raft_data * list (name * msg) :=
....
....
    else if currentTerm state <? term then
      (* follower behind, ignore *)
      (state, [])
    else
      (* leader behind, convert to follower *)
      (advanceCurrentTerm state term, []).

It seems to me that if currentTerm state is less than the follower's term then the leader is behind and we should call advanceCurrentTerm to update its state. Similarly for the final case the follower is behind and we should leave state unchanged although calling advanceCurrentTerm on it would not make a difference.

Answer 1 · 2016-12-15T23:57:18.000Z

Whoa, good catch!

At first I thought, "how can this be, since the proofs check?" But I guess this is a "quality of implementation issue" rather than a safety issue, since at worst, it causes a leader to ignore a message it shouldn't have ignored. This could lead to a stale leader not learning of the new leader for longer than it would have otherwise, and maybe even to some liveness issues, but I think it is "safe".

Still, it is obviously a problem, and we should fix it.

Thanks very much for including a patch as well! I'll take a look at it.

Answer 2 · 2016-12-16T00:03:19.000Z

I think so too. My first reaction was the same and I went through the same exact thought process.
Congrats on the excellent work by the way.

Answer 3 · 2016-12-16T00:07:13.000Z

After talking with some colleagues, I think we have concluded that the branch where a leader receives this kind of message with a higher term is actually dead code. The argument is below. If we're right about that, then the code is technically correct, just obtuse.

The reason this can never happen is that the relevant message is an AppendEntries response. Such responses are only sent to AppendEntries requests. So in order for a leader to receive a response from a larger term, it would have had to be in that larger term to send the request to the follower. Since we have proved that terms never decrease, this can never happen.

Still, we would like to clean up the code, following your patch.

Answer 4 · 2016-12-16T00:15:07.000Z

Hmm, if I understand correctly you talk abou the case where

    else if currentTerm state <? term then

How about the case where there is a transient partition, a new leader B is elected (the term for all nodes except the stale leader A is updated), then the partition is healed and the stale leader A starts sending AppendEntries requests again. Wouldn't that part of the code run in that case?

Again it should not affect, the safety of the protocol since at some point the stale leader A would receive an AppendEntries request from the actual leader and realize what's happened.

Answer 5 · 2016-12-16T00:22:36.000Z

Ah, I think you're right. Man, distributed systems are hard.

Answer 6 · 2016-12-16T00:34:49.000Z

Tell me about it.