broken leader re-election after killing most of cluster nodes

Question

broken leader re-election after killing most of cluster nodes

Closed this issue 5 years ago · 21 comments

I was killing some of a 3 node cluster randomly to verify an issue with re-electing leader and checking the status using

➜  wspace cat clusterstatus.sh 
syncobj_admin -conn 127.0.0.1:6000 -status
syncobj_admin -conn 127.0.0.1:6001 -status
syncobj_admin -conn 127.0.0.1:6002 -status

➜  wspace bash clusterstatus.sh | egrep 'leader:|self:'
leader: localhost:6000
self: localhost:6000
leader: localhost:6000
self: localhost:6001
leader: localhost:6000
self: localhost:6002
➜  wspace OA
zsh: command not found: OA
➜  wspace bash clusterstatus.sh | egrep 'leself:|leader:'  
leader: localhost:6000
self: localhost:6000
leader: localhost:6000
self: localhost:6001
leader: localhost:6000
self: localhost:6002
➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6000
self: localhost:6000
leader: localhost:6000
self: localhost:6001
leader: localhost:6000
self: localhost:6002
➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6001
self: localhost:6001
leader: localhost:6001
self: localhost:6002

Here the leader should've been set to 6001 but got None value instead

➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: None
self: localhost:6000
leader: localhost:6001
self: localhost:6001
leader: localhost:6001
self: localhost:6002
➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6001
self: localhost:6000
leader: localhost:6001
self: localhost:6001
leader: localhost:6001
self: localhost:6002
➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6002
self: localhost:6000
leader: localhost:6002
self: localhost:6001
leader: localhost:6002
self: localhost:6002
➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6001
self: localhost:6000
leader: localhost:6001
self: localhost:6001

And I reached this very interesting state where the node 6000 has a leader 6001 but that leader isn't even active?

➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6001
self: localhost:6000

it was fixed after launching 6002

➜  wspace bash clusterstatus.sh | egrep 'self:|leader:'
leader: localhost:6000
self: localhost:6000
leader: localhost:6000
self: localhost:6002

Answer 1 · 2018-04-14T07:10:48.000Z

Currently leader for node resets only when node receives some messages from other node. It's not a bug - raft requires more than half of the cluster alive to elect new leader, so in your scenario it's a normal behaviour.

Answer 2 · 2018-04-14T07:46:18.000Z

@bakwc tested with 4 nodes and 3 of the nodes live and the re-election didn't happen

Answer 3 · 2018-04-14T07:52:01.000Z

I saw this behaviour too, but in my case
I was testing with 4 nodes, 3 were active, I killed the leader.
There was no re-election.
I cannot reproduce this every time though.

kristofs-MacBook-Pro:recordchain kristofdespiegeleer$ syncobj_admin -conn 127.0.0.1:6001 -status -pass 1233
commit_idx: 1966
enabled_code_version: 0
last_applied: 1966
leader: localhost:6000
leader_commit_idx: 1966
log_len: 972
match_idx_count: 3
match_idx_server_localhost:6000: 0
match_idx_server_localhost:6002: 0
match_idx_server_localhost:6003: 0
next_node_idx_count: 3
next_node_idx_server_localhost:6000: 2
next_node_idx_server_localhost:6002: 2
next_node_idx_server_localhost:6003: 2
partner_node_status_server_localhost:6000: 2
partner_node_status_server_localhost:6002: 2
partner_node_status_server_localhost:6003: 2
partner_nodes_count: 3
raft_term: 2
readonly_nodes_count: 0
revision: 1899fe752bde334787dbfa54bb51bbd9fcf2826c
self: localhost:6001
self_code_version: 0
state: 0
unknown_connections_count: 1
uptime: 88
version: 0.3.3

now I kill the leader

kristofs-MacBook-Pro:recordchain kristofdespiegeleer$ syncobj_admin -conn 127.0.0.1:6001 -status -pass 1233
commit_idx: 3306
enabled_code_version: 0
last_applied: 3306
leader: None
leader_commit_idx: 3306
log_len: 316
match_idx_count: 3
match_idx_server_localhost:6000: 0
match_idx_server_localhost:6002: 0
match_idx_server_localhost:6003: 0
next_node_idx_count: 3
next_node_idx_server_localhost:6000: 2
next_node_idx_server_localhost:6002: 2
next_node_idx_server_localhost:6003: 2
partner_node_status_server_localhost:6000: 0
partner_node_status_server_localhost:6002: 0
partner_node_status_server_localhost:6003: 0
partner_nodes_count: 3
raft_term: 3
readonly_nodes_count: 0
revision: 1899fe752bde334787dbfa54bb51bbd9fcf2826c
self: localhost:6001
self_code_version: 0
state: 1
unknown_connections_count: 1
uptime: 163
version: 0.3.3

i am running the servers in tmux,
the 3 non leaders are working, but all have errors in setting data now

code where we test
https://github.com/rivine/recordchain/edit/master/JumpScale9RecordChain/servers/raft/README.md

restarting the leader leaves everything in limbo

Answer 4 · 2018-04-14T08:26:59.000Z

Thanks for report! I'll try to reproduce it. How long does it take to get this situation? Is it reproduces only when you have a password-protected cluster?

Answer 5 · 2018-04-14T08:46:44.000Z

i'll try for non password protected, i'll do it now.

Answer 6 · 2018-04-14T08:53:45.000Z

yes indeed, that seems to be the issue, without passwd I cannot reproduce .

Answer 7 · 2018-04-15T08:26:01.000Z

tested with 4 nodes and 3 of the nodes live and the re-election didn't happen

@xmonader, did you use the password? You created a 4-node cluster, killed only one node (leader) and new leader was not elected?

Answer 8 · 2018-04-15T13:28:46.000Z

Please try to increase following config options, set them to:

raftMinTimeout = 1.0
raftMaxTimeout = 3.0

Answer 9 · 2018-04-29T04:46:13.000Z

sorry was away for this time, will try to do.

Answer 10 · 2018-05-28T15:21:44.000Z

Do you use python2 or python3?

Answer 11 · 2018-05-30T13:33:27.000Z

3

…

On Mon, May 28, 2018 at 5:21 PM, Filipp Ozinov ***@***.***> wrote: Do you use python2 or python3? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFvi1KAR3cDpXJwUpe1k9b2lxmqGwN7Jks5t3BYKgaJpZM4TU84C> .

-- ------------------------ Kristof De Spiegeleer +971525609014 (my telegram account registered on this nr, best way to reach me) +201206927877 +32 475405474 skype: despiegk

Answer 12 · 2020-03-23T20:04:15.000Z

Any update on this?
I have this issue too

Answer 13 · 2020-03-24T13:05:50.000Z

Could you please provide more details? What is your reproduce steps? What is the cluster size? How much nodes were alive?

Answer 14 · 2020-03-24T13:12:28.000Z

My cluster size is 4
Frist I created a cluster and dynamically add 3 other node. I monitored syncObj._SyncObj__connectedNodes and saw 4 node and first node is leader.
when I kill other nodes cluster goes well. The killed node is in syncObj._SyncObj__otherNodes but it removed from syncObj._SyncObj__connectedNodes.
but for leader, If I kill the leader. different node show different nodes in syncObj._SyncObj__connectedNodes and none of them became leader
I just killed the leader.
I used these also : raftMinTimeout = 1.0 raftMaxTimeout = 3.0

Answer 15 · 2020-03-24T13:24:21.000Z

Also when I removed a node from the cluster it remained in otherNodes in not leader nodes. It maybe also relate to this

Answer 16 · 2020-03-24T13:24:50.000Z

Thanks for report, I'll chek it. What script did you use for test purposes? Could you post it somewhere (pastebin.com)? Is it always reproduces or from time to time?
Your steps were:

Start a single-node cluster
Add 3 more nodes dynamically
Kill the first node
No new leader was elected
Right?

Answer 17 · 2020-03-24T13:42:50.000Z

Checked multiple times, can't reproduce. Could you please provide detailed step-by-step instruction of your actions?

Answer 18 · 2020-03-24T14:06:50.000Z

Acually I did what you explained.
I can share my code with you. I am writing a for dynamic cluster extension. So I have 10 ready to join nodes and for example target number of node to 4. The cluster should expand itself to reach 4. after that I will change the target to 8 after reduce to 3. So I want can add and remove nodes.
To do this the cluster should remain available with different failure. https://pastebin.com/svvG3eHK it is very dirty test code. I have port scanner to find other nodes. I have some problem to remove and add nodes too.

Answer 19 · 2020-03-24T14:09:29.000Z

Also if you want I can show my screen in any call

Answer 20 · 2020-03-24T14:37:54.000Z

When adding nodes you need to specify all current cluster nodes manually. Added #112 to make auto-discovery.

Answer 21 · 2020-03-24T14:39:37.000Z

Thank you,
It works for me