Improve the cleaning mechanism of ShardNode
ZuLiangWang opened this issue · 0 comments
ZuLiangWang commented
Describe this problem
We found that the failover mechanism of the HoraeDB cluster failed, and the shard was not migrated when the machine went down.
Steps to reproduce
- Make the etcd root path configuration in HoraeDB and HoraeMeta inconsistent.
- Shut down a HoraeDB node.
Additional Information
- Add drop
ShardNode
api to deal with some extreme situations. - Add a new way to detect failed nodes, not only relying on etcd's lease event.
- Use a background thread to continuously detect failed nodes.
- Detect failed nodes through heartbeat.