ngaut/unistore

panic on peer storage

hslam opened this issue · 1 comments

hslam commented

All peers in a raft group are incorrectly set to tombstone state when removing a node.

case eraftpb.ConfChangeType_RemoveNode:
// Remove this peer from cache.
delete(d.peer.PeerHeartbeats, peerID)
if d.peer.IsLeader() {
delete(d.peer.PeersStartPendingTime, peerID)
}
delete(d.peer.followersSplitFilesDone, peerID)
d.peer.removePeerCache(peerID)
WritePeerState(d.ctx.raftWB, cp.region, rspb.PeerState_Tombstone, nil)

13:19.507 [applier.go:692] ["[region 236:23] 542 execute admin command. term 10, index 45, command cmd_type:ChangePeer change_peer:<change_type:RemoveNode peer:<id:337 store_id:8 > > "]
13:19.507 [applier.go:926] ["[region 236:23] 542 exec ConfChange, peer_id 337, type RemoveNode, epoch conf_ver:31 version:23 "]
13:19.507 [applier.go:973] ["[region 236:23] 542 remove peer successfully, peer id:337 store_id:8 , region id:236 start_key:\"t\\200\\000\\000\\000\\000\\000\\000\\3779\\000\\000\\000\\000\\000\\000\\000\\370\" end_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377;\\000\\000\\000\\000\\000\\000\\000\\370\" region_epoch:<conf_ver:31 version:23 > peers:<id:337 store_id:8 > peers:<id:501 store_id:2 > peers:<id:542 store_id:10 > peers:<id:573 store_id:4 > "]
13:19.508 [fsm_peer.go:722] ["region 236:23 remove node [store 8 peer 337] from node [store 10 peer 542]"]

After the store is restarted, the metadata of the peer is cleaned up when the peer is loaded. Failed to create the peer storage because the peer state data could not be read.

if localState.State == rspb.PeerState_Tombstone {
tombStoneCount++
ClearMeta(bs.ctx.engine.raft, raftWB, localState.Region)

13:51.207 [engine.go:474] ["load shard 236 ver 23"]
13:51.207 [recover.go:40] ["recover region:236 ver:23"]
14:01.356 [fsm_store.go:239] ["region 236:23 clear meta when loading peers"]
panic: [region 236] 542 unexpected raft log index: lastIndex 0 < appliedIndex 47

goroutine 111 [running]:
github.com/ngaut/unistore/tikv/raftstore.NewPeerStorage(0xc00feaff00, 0xc0198e6080, 0xc0053423c0, 0xc00075c080, 0xc0198ce390, 0x10, 0x10, 0x5b, 0x1f98b58)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_storage.go:142 +0x77f
github.com/ngaut/unistore/tikv/raftstore.NewPeer(0xa, 0xc00000a5a0, 0xc00feaff00, 0xc0198e6080, 0xc0053423c0, 0xc00075c080, 0x2, 0x0, 0x0)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer.go:305 +0x172
github.com/ngaut/unistore/tikv/raftstore.replicatePeerFsm(0xa, 0xc00000a5a0, 0xc0053423c0, 0xc00feaff00, 0xec, 0xc0001c05e8, 0x12, 0x18, 0xc0001c0600, 0x12, ...)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_peer.go:98 +0x25d
github.com/ngaut/unistore/tikv/raftstore.(*storeMsgHandler).maybeCreatePeer(0xc00053e580, 0xec, 0xc01d3d58c0, 0x0, 0x0, 0x0)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_store.go:577 +0x28b
github.com/ngaut/unistore/tikv/raftstore.(*storeMsgHandler).onRaftMessage(0xc00053e580, 0xc01d3d58c0, 0x18cd160, 0x1930120)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_store.go:519 +0x4b6
github.com/ngaut/unistore/tikv/raftstore.(*storeMsgHandler).handleMsg(0xc00053e580, 0x65, 0xec, 0x19a9f00, 0xc01d3d58c0)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_store.go:178 +0x8a
github.com/ngaut/unistore/tikv/raftstore.(*storeWorker).run(0xc00000f1c0, 0xc020408180, 0xc01cb16ee0)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:385 +0x207
created by github.com/ngaut/unistore/tikv/raftstore.(*raftBatchSystem).startWorkers
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_store.go:384 +0x3c9
hslam commented

Fixed by #664