hashicorp/raft

Why is the leader channel notified BEFORE the new leader's log is up to date?

scottwis opened this issue · 2 comments

Inside of runLeader the call to overrideNotifyBool happens before the call to dispatchLogs

    func (r *Raft) runLeader() {
	r.logger.Info("entering leader state", "leader", r)
	metrics.IncrCounter([]string{"raft", "state", "leader"}, 1)

	// Notify that we are the leader
	overrideNotifyBool(r.leaderCh, true)
	
	//...
	
	// Dispatch a no-op log entry first. This gets this leader up to the latest
	// possible commit index, even in the absence of client commands. This used
	// to append a configuration entry instead of a noop. However, that permits
	// an unbounded number of uncommitted configurations in the log. We now
	// maintain that there exists at most one uncommitted configuration entry in
	// any log, so we have to do proper no-ops here.
	noop := &logFuture{log: Log{Type: LogNoop}}
	r.dispatchLogs([]*logFuture{noop})
	
	//...
    }

Is there a reason why the overrideNotifyBool call can't be placed after the dispatchLogs call?

The way it is implemented now, raft.LeaderCh() will receive notification that it has become the leader before it's state machine is up to date.

banks commented

The way it is implemented now, raft.LeaderCh() will receive notification that it has become the leader before it's state machine is up to date.

Hmm that is a good point, and I'm not sure there would be harm in moving the notify to after the dispatch, but that doesn't actually solve the problem the way your question implies it might as far as I can tell.

The log dispatch here only waits for the no-op log to be persisted locally, it's not waiting for any confirmations or commitment info from a quorum, synchronously, it's just an operation to make sure the new leader's replicating and comittment routines have been triggered by something in the absence of real client commands.

So even if the notify happens after that dispatch line, it still doesn't provide any stronger guarantee that the leader's commitments are more up to date as it's probably still not heard back from a quorum yet. And even if it has, this is only commit so there is still no guarantee that the leader has actually applied anything recent to its FSM which is what applications really care about.

The way the library is designed to be used in this case was actually asked about just a few days ago and I responded to earlier today in #508 .

Basically as a consumer of this library, you would trigger your "I am now the leader" routine based on the notification you get from this code, but then if it's critical that you don't accept any operation until the leader's FSM is up to date (i.e. you need linearizable consistency operations), then you have to use the raft.Barrier() call in your leader loop to wait for actual FSM apply to confirm that not only are all committed updates present, and all commitment state populated (which is what the line you linked is doing I think) but also that all updates have been applied to the local FSM.

Does that help?

Yes, it helps a bunch. Thanks.

I'll a Barrier to the go routine that receives the leader notifications.