tarantool/tarantool-operator

Leader switch

Closed this issue · 6 comments

Problem statement

Leader elected only once during cluster initial bootstrap

Possible solution

  1. Annotate Cluster resource with some field to denote tarantool cartridge cluster wide config generation

  2. In case of leader failure cross check annotation with every instance config generation

  3. select any alive instance with matching config generation as leader

Hi Dear Team! Is there any plans to fix the issue or recommendations to workaround it in order to automatically recover the cluster after the current leader failure?

R-omk commented

related tarantool/tarantool-operator-ee#11

Still valid?

Affirmative Sir. Clearing and reassembling the cluster manually quite regularly.

R-omk commented

There is no need to save the ip address because it can change after deleting the pod.

UPD:
On the other hand, the endpoint will also be removed, probably

R-omk commented

It need to choose a leader who has a non-empty disk after data loss. Otherwise, there will be a loss of the cluster config (override with empty) and a possible incorrect topology if we do not fix this issue #116

In other words, only the node that has the most recent version of the cluster config can be the leader.