ovn-org/ovn

ovsdb follower always join the leader and status is catchup

gujun4990 opened this issue ยท 16 comments

Version

ovsdb-server: 2.16.2

Description

I have 3 ovsdb-servers to construct a cluster. sb0 is leader, sb1 and sb2 are followers. The operation that sb1 join sb0 is success, but failure for sb2.

cluster information

[root@node-3 gj]# kubectl exec -itn openstack     ovn-ovsdb-sb-0 -- ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
Defaulted container "ovn-ovsdb-sb" out of: ovn-ovsdb-sb, init (init)
396d
Name: OVN_Southbound
Cluster ID: e15f (e15f359e-f899-4e18-9188-6b38acdb0b3f)
Server ID: 396d (396d6421-31e0-4a4a-b50f-73db4e7898a5)
Address: tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
Status: cluster member
Adding server ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) (adding: catchup)
Role: leader
Term: 4
Leader: self
Vote: self

Last Election started 5449532 ms ago, reason: timeout
Last Election won: 5449524 ms ago
Election timer: 8000
Log: [2, 17]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-248e ->248e <-ad90
Disconnections: 0
Servers:
    396d (396d at tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) (self) next_index=8 match_index=16
    248e (248e at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) next_index=17 match_index=16 last msg 475 ms ago

[root@node-3 gj]# kubectl exec -itn openstack     ovn-ovsdb-sb-1 -- ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 
Defaulted container "ovn-ovsdb-sb" out of: ovn-ovsdb-sb, init (init)
248e
Name: OVN_Southbound
Cluster ID: e15f (e15f359e-f899-4e18-9188-6b38acdb0b3f)
Server ID: 248e (248e41fd-53ab-4052-bf72-eba1109dc886)
Address: tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
Status: cluster member
Role: follower
Term: 4
Leader: 396d
Vote: 396d

Last Election started 5490476 ms ago, reason: timeout
Election timer: 8000
Log: [2, 17]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 <-396d <-ad90
Disconnections: 0
Servers:
    396d (396d at tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) last msg 896 ms ago
    248e (248e at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) (self)
[root@node-3 gj]# 

[root@node-3 gj]# kubectl exec -itn openstack     ovn-ovsdb-sb-2 -- ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 
Defaulted container "ovn-ovsdb-sb" out of: ovn-ovsdb-sb, init (init)
ad90
Name: OVN_Southbound
Cluster ID: e15f (e15f359e-f899-4e18-9188-6b38acdb0b3f)
Server ID: ad90 (ad906b08-637a-4419-aab4-414aa61ee742)
Address: tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
Status: joining cluster
Remotes for joining: tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644 tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
Role: follower
Term: 1
Leader: 396d
Vote: unknown

Election timer: 1000
Log: [1, 1]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->0000
Disconnections: 0
Servers:

Related logs

  • the logs for sb0
2024-02-29T05:07:35Z|00002|raft|INFO|term 1: 3388678 ms timeout expired, starting election
2024-02-29T05:07:35Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers
2024-02-29T05:07:35Z|00004|raft|INFO|local server ID is 396d
2024-02-29T05:07:35Z|00005|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.16.2.20231023
2024-02-29T05:07:35Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2024-02-29T05:07:35Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
Waiting for OVN_Southbound to come up [  OK  ]
2024-02-29T05:07:45Z|00006|memory|INFO|15392 kB peak resident set size after 10.0 seconds
2024-02-29T05:07:45Z|00007|memory|INFO|cells:20 monitors:0 raft-log:1
2024-02-29T05:07:52Z|00008|raft|INFO|tcp:10.232.0.74:58110: learned server ID 248e
2024-02-29T05:07:52Z|00009|raft|INFO|tcp:10.232.0.74:58110: learned remote address tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-02-29T05:07:52Z|00010|raft|INFO|starting to add server 248e (248e at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f
2024-02-29T05:07:52Z|00011|raft|INFO|sending snapshot to server 248e, 1:1.
2024-02-29T05:07:52Z|00012|raft|INFO|cluster e15f: installed snapshot on server 248e  up to 1:1
2024-02-29T05:07:52Z|00013|raft|INFO|adding 248e (248e at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:52Z|00014|stream_tcp|WARN|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connect: Address family not supported by protocol
2024-02-29T05:07:52Z|00015|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:52Z|00016|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:52Z|00017|raft|INFO|adding 248e (248e at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f succeeded (completed)
2024-02-29T05:07:53Z|00018|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:53Z|00019|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:53Z|00020|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 2 seconds before reconnect
2024-02-29T05:07:55Z|00021|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:55Z|00022|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:55Z|00023|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 4 seconds before reconnect
2024-02-29T05:07:57Z|00024|raft|INFO|tcp:10.232.1.47:32832: learned server ID ad90
2024-02-29T05:07:57Z|00025|raft|INFO|tcp:10.232.1.47:32832: learned remote address tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-02-29T05:07:57Z|00026|raft|INFO|starting to add server ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f
2024-02-29T05:07:57Z|00027|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00028|raft|WARN|ignoring vote request received as leader
2024-02-29T05:07:58Z|00029|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00030|timeval|WARN|Unreasonably long 1048ms poll interval (1ms user, 0ms system)
2024-02-29T05:07:58Z|00031|timeval|WARN|disk: 0 reads, 24 writes
2024-02-29T05:07:58Z|00032|timeval|WARN|context switches: 1 voluntary, 1 involuntary
2024-02-29T05:07:58Z|00033|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=d685b6af:
2024-02-29T05:07:58Z|00034|coverage|INFO|hmap_pathological          0.0/sec     0.100/sec        0.0017/sec   total: 6
2024-02-29T05:07:58Z|00035|coverage|INFO|hmap_expand               25.8/sec    18.983/sec        0.3164/sec   total: 1669
2024-02-29T05:07:58Z|00036|coverage|INFO|lockfile_lock              0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00037|coverage|INFO|poll_create_node          31.4/sec     7.667/sec        0.1278/sec   total: 746
2024-02-29T05:07:58Z|00038|coverage|INFO|poll_zero_timeout          0.4/sec     0.117/sec        0.0019/sec   total: 12
2024-02-29T05:07:58Z|00039|coverage|INFO|seq_change                 7.4/sec     1.950/sec        0.0325/sec   total: 169
2024-02-29T05:07:58Z|00040|coverage|INFO|pstream_open               0.0/sec     0.050/sec        0.0008/sec   total: 4
2024-02-29T05:07:58Z|00041|coverage|INFO|stream_open                0.4/sec     0.033/sec        0.0006/sec   total: 3
2024-02-29T05:07:58Z|00042|coverage|INFO|unixctl_received           0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00043|coverage|INFO|unixctl_replied            0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00044|coverage|INFO|util_xalloc              652.0/sec   389.150/sec        6.4858/sec   total: 35954
2024-02-29T05:07:58Z|00045|coverage|INFO|114 events never hit
2024-02-29T05:07:58Z|00046|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (lost leadership)
2024-02-29T05:07:58Z|00047|raft|INFO|rejected append_reply (not leader)
2024-02-29T05:07:59Z|00048|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:59Z|00049|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connected
2024-02-29T05:07:59Z|00050|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (not leader)
2024-02-29T05:08:00Z|00051|raft|INFO|term 4: 1689 ms timeout expired, starting election
2024-02-29T05:08:00Z|00052|raft|INFO|term 4: elected leader by 2+ of 2 servers
2024-02-29T05:08:00Z|00053|transaction|INFO|Set OVN_Southbound max transaction historys: 5000
2024-02-29T05:08:00Z|00054|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:08:01Z|00055|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:08:02Z|00056|raft|INFO|rejecting term 1 < current term 4 received in append_reply message from server ad90
2024-02-29T05:08:02Z|00057|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:08:03Z|00058|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:08:04Z|00059|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
  • the logs for sb1
2024-02-29T05:07:52Z|00019|reconnect|INFO|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connected
2024-02-29T05:07:52Z|00020|raft_rpc|INFO|learned cluster ID e15f
2024-02-29T05:07:52Z|00021|raft|INFO|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: learned server ID 396d
2024-02-29T05:07:52Z|00022|raft|INFO|server 396d is leader for term 1
2024-02-29T05:07:52Z|00023|raft|INFO|rejecting append_request because previous entry 1,2 not in local log (mismatch past end of log)
2024-02-29T05:07:52Z|00024|raft|INFO|server 396d added to configuration
2024-02-29T05:07:52Z|00025|raft|INFO|server 248e added to configuration
  • the logs for sb2
2024-02-29T05:07:57Z|00019|reconnect|INFO|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connected
2024-02-29T05:07:57Z|00020|raft_rpc|INFO|learned cluster ID e15f
2024-02-29T05:07:57Z|00021|raft|INFO|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: learned server ID 396d
2024-02-29T05:07:57Z|00022|raft|INFO|server 396d is leader for term 1
2024-02-29T05:07:57Z|00023|raft|INFO|rejecting append_request because previous entry 1,6 not in local log (mismatch past end of log)
2024-02-29T05:07:57Z|00024|raft|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: learned new server address for joining cluster
2024-02-29T05:07:57Z|00025|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:57Z|00026|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:58Z|00027|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:58Z|00028|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:58Z|00029|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 2 seconds before reconnect
2024-02-29T05:08:00Z|00030|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:08:00Z|00031|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:08:00Z|00032|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 4 seconds before reconnect

Analysis

ๅ›พ็‰‡
According to the above process, I found that the sb0 seems to reject sb2's append reply(mismatch past end of log). Because sb0 was not the leader at that time. But I'm not sure that the append_reply is "mismatch past end of log" that sb0 send.
So sb2 always send the operation that joining sb0, but s0 thinks it have been dealing with the operation.
BTW, I have searched the related bug and our code have included the related patch.
@igsilya Have any ideas?

Hi, @gujun4990 . Thanks for the report. The issue indeed looks very similar to one fixed by openvswitch/ovs@b6007ad . However, in your case the sb2 is not disconnecting from sb0. So, when sb0 regains leadership, the sb2 is still in a catch-up state for it and the new join request is ignored. I think, the following change should fix the problem:

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index f463afcb3..6537c3415 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -2702,9 +2702,10 @@ raft_become_follower(struct raft *raft)
      * AppendEntries processing will complete the corresponding commands.
      */
     struct raft_server *s;
-    HMAP_FOR_EACH (s, hmap_node, &raft->add_servers) {
+    HMAP_FOR_EACH_POP (s, hmap_node, &raft->add_servers) {
         raft_send_add_server_reply__(raft, &s->sid, s->address, false,
                                      RAFT_SERVER_LOST_LEADERSHIP);
+        raft_server_destroy(s);
     }
     if (raft->remove_server) {
         raft_send_remove_server_reply__(raft, &raft->remove_server->sid,

Could you try it?

I'll work on a proper patch and a test for this.

Thanks @igsilya . I will try it and reply to you.

From the analysis, your above patch should be able to solve this issue. But I have a confusion that sb2 send a append_reply(mismatch past end of log) to sb0 and the sb2 is not disconnecting from sb0, why the request of append_reply is not handled? At that time, the leader election has not yet started between sb0 and sb1.
.

sb0 lost leadership before receiving append reply form sb2:

2024-02-29T05:07:58Z|00046|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (lost leadership)
2024-02-29T05:07:58Z|00047|raft|INFO|rejected append_reply (not leader)

So the reply got discarded as sb0 wasn't a leader anymore.

Yes, sb0 reject a append_reply, but I'm not sure the append_reply is "mismatch past end of log" from sb2.

2024-02-29T05:07:57Z|00024|raft|INFO|tcp:10.232.1.47:32832: learned server ID ad90
2024-02-29T05:07:57Z|00025|raft|INFO|tcp:10.232.1.47:32832: learned remote address tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-02-29T05:07:57Z|00026|raft|INFO|starting to add server ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f
2024-02-29T05:07:57Z|00027|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00028|raft|WARN|ignoring vote request received as leader
2024-02-29T05:07:58Z|00029|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00046|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (lost leadership)
2024-02-29T05:07:58Z|00047|raft|INFO|rejected append_reply (not leader)

From the above logs from sb0, the append_reply("mismatch past end of log") from sb2 should be handled before the first log of "adding ad90...to cluster e15f failed (in progress)". The log of "adding ad90..." from sb2 shows that sb2 send a add_server_request to sb0 again. This is the key point of my confusion.
In addition, Our another environment also have the same problem, but have no "rejected append_reply (not leader)".

2024-01-30T15:28:04Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2024-01-30T15:28:04Z|00002|raft|INFO|term 1: 2098796 ms timeout expired, starting election
2024-01-30T15:28:04Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers
2024-01-30T15:28:04Z|00004|raft|INFO|local server ID is 4f3c
2024-01-30T15:28:04Z|00005|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.16.2.20231023
2024-01-30T15:28:05Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2024-01-30T15:28:05Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
Waiting for OVN_Southbound to come up [  OK  ]
2024-01-30T15:28:14Z|00006|memory|INFO|15408 kB peak resident set size after 10.0 seconds
2024-01-30T15:28:14Z|00007|memory|INFO|cells:20 monitors:0 raft-log:1
2024-01-30T15:28:19Z|00008|raft|INFO|tcp:10.232.0.71:51078: learned server ID 7a60
2024-01-30T15:28:19Z|00009|raft|INFO|tcp:10.232.0.71:51078: learned remote address tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-01-30T15:28:19Z|00010|raft|INFO|starting to add server 7a60 (7a60 at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89
2024-01-30T15:28:19Z|00011|raft|INFO|sending snapshot to server 7a60, 1:1.
2024-01-30T15:28:19Z|00012|raft|INFO|cluster 1c89: installed snapshot on server 7a60  up to 1:1
2024-01-30T15:28:19Z|00013|stream_tcp|WARN|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connect: Address family not supported by protocol
2024-01-30T15:28:19Z|00014|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-01-30T15:28:19Z|00015|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-01-30T15:28:19Z|00016|raft|INFO|adding 7a60 (7a60 at tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 succeeded (completed)
2024-01-30T15:28:20Z|00017|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-01-30T15:28:20Z|00018|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-01-30T15:28:20Z|00019|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 2 seconds before reconnect
2024-01-30T15:28:22Z|00020|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-01-30T15:28:22Z|00021|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-01-30T15:28:22Z|00022|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 4 seconds before reconnect
2024-01-30T15:28:24Z|00023|raft|INFO|tcp:10.232.1.70:59534: learned server ID 9574
2024-01-30T15:28:24Z|00024|raft|INFO|tcp:10.232.1.70:59534: learned remote address tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-01-30T15:28:24Z|00025|raft|INFO|starting to add server 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89
2024-01-30T15:28:25Z|00026|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (in progress)
2024-01-30T15:28:26Z|00027|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-01-30T15:28:26Z|00028|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-01-30T15:28:26Z|00029|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: continuing to reconnect in the background but suppressing further logging
2024-01-30T15:28:26Z|00030|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (in progress)
2024-01-30T15:28:27Z|00031|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (in progress)
2024-01-30T15:28:28Z|00032|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (in progress)
2024-01-30T15:28:29Z|00033|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (in progress)
2024-01-30T15:28:31Z|00034|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (lost leadership)
2024-01-30T15:28:31Z|00035|raft|INFO|term 2: starting election
2024-01-30T15:28:31Z|00036|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (not leader)
2024-01-30T15:28:31Z|00037|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (not leader)
2024-01-30T15:28:32Z|00038|raft|INFO|term 3: 1284 ms timeout expired, starting election
2024-01-30T15:28:33Z|00039|reconnect|WARN|tcp:10.232.0.71:51078: no response to inactivity probe after 2 seconds, disconnecting
2024-01-30T15:28:33Z|00040|raft|INFO|adding 9574 (9574 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster 1c89 failed (not leader)

The log of "adding ad90..." from sb2 shows that sb2 send a add_server_request to sb0 again. This is the key point of my confusion.

It is assumed that the connection is happening shortly after raft_open(), so join_timeout is not updated after the first sending of the join request in raft_conn_run(). So, if it takes a bit longer to establish the first connection to the sb0, the join re-try can happen in raft_run() much earlier than a default 1 second. This can be avoided by doing something like this:

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index f463afcb3..239b092ee 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -1582,6 +1582,7 @@ raft_conn_run(struct raft *raft, struct raft_conn *conn)
     conn->js_seqno = new_seqno;
     if (just_connected) {
         if (raft->joining) {
+            raft->join_timeout = time_msec() + 1000;
             raft_send_add_server_request(raft, conn);
         } else if (raft->leaving) {
             union raft_rpc rq = {

But is is a bit awkward to change global join timeout in a function that supposed to handle a specific connection, so I'm not sure if we need to do that. This join request should be harmless in general.

Our another environment also have the same problem, but have no "rejected append_reply (not leader)".

In this case it looks like sb1 disconnected after joining, so the cluster has no quorum to elect a leader and it is not able to join a new server for the same reason.

Thank you for your patient reply. In addition, I tried to reproduce the problem several times, but failed. Because There was no leadership election between the sb0 and a new join server. So whether there are some ways to reproduce the issue.

@gujun4990 It is hard to reproduce indeed. I don't know of any reliable way to do this without artificially creating the issue from within the code.
We have a few failure-test appctl commands that can simulate certain failure scenarios, so I hope to use those for the unit test, but they should not be used in production environment as most of them are just crashing the process and not designed to be used outside of unit tests.

Ok, I will try it.

So, I was working on the test for this issue and I managed to get the cluster of 3 nodes into a state where an elected leader thinks that it didn't join the cluster yet... It's likely unrelated to the issue here, but I need to investigate that before I can continue with the test.

I've tried to reproduce this problem by constructing a few scenarios, but all have failed, so I'm analyzing the code to see if I've achieved my goal.

Here is a patch I made: igsilya/ovs@57fa3ba It contains a test that does reproduce the issue through a new failure test command.

A branch with a few more required fixes (5 total) including the other issue with a leader in a joining state is available here: https://github.com/igsilya/ovs/commits/tmp-raft-join/ . My plan is to have another look at them and post for review on Monday.

@igsilya I have rechecked the logs and analyzed the code again. I suspect sb1 send a append_request to cause rejected append_reply (not leader) from sb0, but not sure. Consider the following sequence of events:

  1. Cluster with 2 nodes - A and B. A is a leader.
  2. C connects to A and sends a join request.
  3. A sends an append request to C. C is in CATCHUP phase for A.
  4. B sends a vote request to A.
  5. A handle vote request and send a vote reply to B.
  6. At the same, A send a heartbeat(append_request) to B.
  7. B rejects vote reply and append request from A, because term 1 < current term 2
  8. For the append request, B will send a append reply(stale term) to A.
  9. A will become follower and rejects append reply from B. So A will print rejected append_reply (not leader)

The related logs from A(sb0):

2024-02-29T05:07:53Z|00020|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 2 seconds before reconnect
2024-02-29T05:07:55Z|00021|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:55Z|00022|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connection attempt failed (Address family not supported by protocol)
2024-02-29T05:07:55Z|00023|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: waiting 4 seconds before reconnect
2024-02-29T05:07:57Z|00024|raft|INFO|tcp:10.232.1.47:32832: learned server ID ad90
2024-02-29T05:07:57Z|00025|raft|INFO|tcp:10.232.1.47:32832: learned remote address tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-02-29T05:07:57Z|00026|raft|INFO|starting to add server ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f
2024-02-29T05:07:57Z|00027|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00028|raft|WARN|ignoring vote request received as leader
2024-02-29T05:07:58Z|00029|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (in progress)
2024-02-29T05:07:58Z|00030|timeval|WARN|Unreasonably long 1048ms poll interval (1ms user, 0ms system)
2024-02-29T05:07:58Z|00031|timeval|WARN|disk: 0 reads, 24 writes
2024-02-29T05:07:58Z|00032|timeval|WARN|context switches: 1 voluntary, 1 involuntary
2024-02-29T05:07:58Z|00033|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=d685b6af:
2024-02-29T05:07:58Z|00034|coverage|INFO|hmap_pathological          0.0/sec     0.100/sec        0.0017/sec   total: 6
2024-02-29T05:07:58Z|00035|coverage|INFO|hmap_expand               25.8/sec    18.983/sec        0.3164/sec   total: 1669
2024-02-29T05:07:58Z|00036|coverage|INFO|lockfile_lock              0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00037|coverage|INFO|poll_create_node          31.4/sec     7.667/sec        0.1278/sec   total: 746
2024-02-29T05:07:58Z|00038|coverage|INFO|poll_zero_timeout          0.4/sec     0.117/sec        0.0019/sec   total: 12
2024-02-29T05:07:58Z|00039|coverage|INFO|seq_change                 7.4/sec     1.950/sec        0.0325/sec   total: 169
2024-02-29T05:07:58Z|00040|coverage|INFO|pstream_open               0.0/sec     0.050/sec        0.0008/sec   total: 4
2024-02-29T05:07:58Z|00041|coverage|INFO|stream_open                0.4/sec     0.033/sec        0.0006/sec   total: 3
2024-02-29T05:07:58Z|00042|coverage|INFO|unixctl_received           0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00043|coverage|INFO|unixctl_replied            0.0/sec     0.017/sec        0.0003/sec   total: 1
2024-02-29T05:07:58Z|00044|coverage|INFO|util_xalloc              652.0/sec   389.150/sec        6.4858/sec   total: 35954
2024-02-29T05:07:58Z|00045|coverage|INFO|114 events never hit
2024-02-29T05:07:58Z|00046|raft|INFO|adding ad90 (ad90 at tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644) to cluster e15f failed (lost leadership)
2024-02-29T05:07:58Z|00047|raft|INFO|rejected append_reply (not leader)
2024-02-29T05:07:59Z|00048|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connecting...
2024-02-29T05:07:59Z|00049|reconnect|INFO|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644: connected

The related logs from B(sb1):

2024-02-29T05:07:58Z|00032|raft|INFO|term 2: 1036 ms timeout expired, starting election
2024-02-29T05:07:58Z|00033|raft|INFO|rejecting term 1 < current term 2 received in append_request message from server 396d
2024-02-29T05:07:58Z|00034|raft|INFO|rejecting term 1 < current term 2 received in vote_reply message from server 396d
2024-02-29T05:07:59Z|00035|raft|INFO|tcp:10.232.2.60:49480: learned server ID 396d
2024-02-29T05:07:59Z|00036|raft|INFO|tcp:10.232.2.60:49480: learned remote address tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6644
2024-02-29T05:08:00Z|00037|raft|INFO|term 3: 1264 ms timeout expired, starting election
2024-02-29T05:08:00Z|00038|raft|INFO|server 396d is leader for term 4
2024-02-29T05:08:00Z|00039|raft|INFO|rejecting append_request because previous entry 1,7 not in local log (mismatch past end of log)
2024-02-29T05:08:00Z|00040|transaction|INFO|Set OVN_Southbound max transaction historys: 5000

Although sb0 have disconnected logs with sb1, sb0 seems to receive rpc msg from sb1. Because only sb1 send a vote request to sb0 and sb0 print the logs that ignoring vote request received as leader.
If the above process is correct, How does sb0 handle the append_reply("mismatch past end of log") from sb2?

@gujun4990 It's hard to tell what happened without full logs from the start of these processes. I'd guess that sb0->sb1 connection got broken while sb1->sb0 connection remained functional. So, sb0 was trying to send append requests to sb1, but discovered that it disconnected. sb1 didn't receive heartbeats for a while, so it started election. But sb0 disagreed and declined the vote request. Once connection between sb0 and sb1 was repaired, sb0 sent out the next heartbeat or append request and since sb1 didn't receive anything while connection was broken, sb1 replied with a mismatch. sb0 should then send a new snapshot to sb1 and they should work just fine afterwards.

FWIW, fixes were applied to all supported branches of OVS down to 2.17 (2.16 is EoL for quite some time already).

This is not really an OVN issue but an ovsdb RAFT one. Moreover, the OVS patches are applied to all supported branches. Therefore closing. Thanks @igsilya @gujun4990!