sofastack/sofa-jraft

Seems that readIndex() may return inconsistent value (e.g. 0) after node restart, is this a known issue?

sanpwc opened this issue · 6 comments

Within node restart lastCommittedIndex initialized as 0

public class BallotBox implements Lifecycle<BallotBoxOptions>, Describer {
    ...
    private long lastCommittedIndex = 0;
    ...
}

Taking into the consideration that local log application triggered by NodeImpl#init is asynchronous, it's possible to handle readIndex request before local log re-play is finished and thus see inconsistent readIndex value. In other words, seems that there's a race between readIndex evaluation and lastCommittedIndex restoration on node restart.

I think it's impossible, the handleReadIndexRequest will check the state of the node before reading the lastCommittedIndex, and if its state is not the leader or the current leader doesn't commit an entry before, the readIndex will fail.

if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) {

And when a node becomes a leader, it will update the lastCommittedIndex before releasing the write lock:

this.confCtx.flush(this.conf.getConf(), this.conf.getOldConf());

this.node.unsafeApplyConfiguration(conf, oldConf == null || oldConf.isEmpty() ? null : oldConf, true);

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

If so, that would be possible, good catch! But I really doubt if is this a common case(just one node) for the production usage of jraft.

We can fix it, but maybe it is not urgent.

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries.
So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.

    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }

What do you think?

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries. So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.

    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }

What do you think?

It seems like the fix will work as expected. Would you be able to create a pull request for this change?

Sure