ngaut/unistore

panic on applier

hslam opened this issue · 2 comments

hslam commented
[2021/08/04 17:42:48.900 +08:00] [INFO] [applier.go:688] ["[region 24:10] 44 execute admin command. term 6, index 27, command cmd_type:BatchSplit splits:<requests:<split_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377\\017\\000\\000\\000\\000\\000\\000\\000\\370\" new_region_id:71 new_peer_ids:72 new_peer_ids:73 > right_derive:true > "]
[2021/08/04 17:42:48.900 +08:00] [INFO] [applier.go:1047] ["24:10 split region id:24 start_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377\\r\\000\\000\\000\\000\\000\\000\\000\\370\" end_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377\\021\\000\\000\\000\\000\\000\\000\\000\\370\" region_epoch:<conf_ver:3 version:10 > peers:<id:25 store_id:1 > peers:<id:44 store_id:3 > , keys [[116 128 0 0 0 0 0 0 255 13 0 0 0 0 0 0 0 248] [116 128 0 0 0 0 0 0 255 15 0 0 0 0 0 0 0 248] [116 128 0 0 0 0 0 0 255 17 0 0 0 0 0 0 0 248]]"]
[2021/08/04 17:42:48.900 +08:00] [WARN] [applier.go:995] ["24:10 is not in split file done stage for finish split, pause apply"]
[2021/08/04 17:42:48.900 +08:00] [ERROR] [applier.go:598] ["execute raft command region_id 24, peer_id 44, err finish-split wrong stage"] [stack="github.com/ngaut/unistore/tikv/raftstore.(*applier).applyRaftCmd\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:598\ngithub.com/ngaut/unistore/tikv/raftstore.(*applier).processRaftCmd\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:558\ngithub.com/ngaut/unistore/tikv/raftstore.(*applier).handleRaftEntryNormal\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:481\ngithub.com/ngaut/unistore/tikv/raftstore.(*applier).handleRaftCommittedEntries\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:434\ngithub.com/ngaut/unistore/tikv/raftstore.(*applier).handleApply\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:1209\ngithub.com/ngaut/unistore/tikv/raftstore.(*applier).handleMsg\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:1280\ngithub.com/ngaut/unistore/tikv/raftstore.(*applyWorker).run\n\t/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:318"]
[2021/08/04 17:42:48.905 +08:00] [INFO] [peer.go:1068] ["follower 24:10 add extra message split file done"]
[2021/08/04 17:42:49.096 +08:00] [INFO] [flush.go:193] ["shard 24:10 added 1 l0s"]
[2021/08/04 17:42:49.096 +08:00] [INFO] [worker.go:541] ["shard 24:10 apply change set flush stage PRE_SPLIT_FLUSH_DONE seq 25"]
[2021/08/04 17:42:49.096 +08:00] [INFO] [shard.go:352] ["shard 24:10 set split stage PRE_SPLIT_FLUSH_DONE"]
[2021/08/04 17:42:49.096 +08:00] [INFO] [worker.go:541] ["shard 24:10 apply change set split_files stage SPLIT_FILE_DONE seq 26"]
[2021/08/04 17:42:49.096 +08:00] [INFO] [shard.go:352] ["shard 24:10 set split stage SPLIT_FILE_DONE"]
panic: [region 24:10] 44 expect index 28, but got 27

goroutine 77 [running]:
github.com/ngaut/unistore/tikv/raftstore.(*applier).handleRaftCommittedEntries(0xc0001b4840, 0xc00047a850, 0xc000290340, 0x1, 0x1)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:429 +0xe5f
github.com/ngaut/unistore/tikv/raftstore.(*applier).handleApply(0xc0001b4840, 0xc00047a850, 0xc002b02570)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:1209 +0x125
github.com/ngaut/unistore/tikv/raftstore.(*applier).handleMsg(0xc0001b4840, 0xc00047a850, 0x133, 0x18, 0x0, 0x0)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/applier.go:1294 +0x18e
github.com/ngaut/unistore/tikv/raftstore.(*applyWorker).run(0xc0002649f0, 0xc000136030)
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:318 +0x109
created by github.com/ngaut/unistore/tikv/raftstore.(*raftBatchSystem).startWorkers
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_store.go:372 +0x25b
hslam commented

When execute a raft command, if we need to pause applying this command and execute it next time, we can't set applied Index to this entry index.

a.applyState.appliedIndex = aCtx.execCtx.index
cs := buildSplitChangeSet(a.region, regions, a.applyState)
err = aCtx.engines.kv.FinishSplit(cs)
if err != nil {
if err == engine.ErrFinishSplitWrongStage {
// This must be a follower that fall behind, we need to pause the apply and wait for split files to finish
// in the background worker.
log.S().Warnf("%d:%d is not in split file done stage for finish split, pause apply",
a.region.Id, a.region.RegionEpoch.Version)
result = applyResult{tp: applyResultTypePause}
}
return
}

hslam commented

Fixed by #618