hashicorp/raft

About implementation of logs being caught up

Closed this issue · 4 comments

A quote from chapter 6 of raft:

To avoid availability gaps, Raft introduces an additional phase before the configuration change, in which the new servers join the cluster as non-voting members (the leader replicates log entries to them, but they are not considered for majorities).

And then, the current implementation as follows:

  1. Add non-voting server to the configuration by AddNonVoter
  2. Dispatch configuration changelog entry to old followers
  3. Update the latest configuration
  4. Starting caught up

But, from the current implementation, it seems that the server is no becoming voter after catching up.

Finally, there is a question as follow:

  • How to known it has caught up logs?

From what I remember AddVoter actually does what you want. It should start as a NonVoter and automatically change to Voter once it's caught up. (You should confirm this though, as I'm not 100% sure.)

@JelteF It seems to be a todo.

AFAIK, the AddVoter will add the given server to the cluster and assign it a vote:

func nextConfiguration(current Configuration, currentIndex uint64, change configurationChangeRequest) (Configuration, error) {
         // ...
	configuration := current.Clone()
	switch change.command {
	case AddStaging:
		// TODO: barf on new address?
		newServer := Server{
			// TODO: This should add the server as Staging, to be automatically
			// promoted to Voter later. However, the promotion to Voter is not yet
			// implemented, and doing so is not trivial with the way the leader loop
			// coordinates with the replication goroutines today. So, for now, the
			// server will have a vote right away, and the Promote case below is
			// unused.
			Suffrage: Voter,
			ID:       change.serverID,
			Address:  change.serverAddress,
		}
                // ...
	case AddNonvoter:
               // ...
	}
        // ...
	return configuration, nil
}
func (r *Raft) quorumSize() int {
	voters := 0
	for _, server := range r.configurations.latest.Servers {
		if server.Suffrage == Voter {
			voters++
		}
	}
	return voters/2 + 1
}

IMHO, the non-voting server should be added to the cluster by AddNonvoter, and then to be automatically promoted to Voter later.

We recommend using https://github.com/hashicorp/raft-autopilot to get this functionality. It will add nodes as non-voter and then, after they've been "stable" for long enough and are keeping up, they'll be promoted to voters.

Happy to re-open if this doesn't help or if you still have questions.