Potentially misleading documentation in BootstrapCluster method
Closed this issue · 4 comments
I'm observing different behaviour than the documented behaviour for func (*Raft) BootstrapCluster
. The documentation for this method says the following:
BootstrapCluster is equivalent to non-member BootstrapCluster but can be called on an un-bootstrapped Raft instance after it has been created. This should only be called at the beginning of time for the cluster, and you absolutely must make sure that you call it with the same configuration on all the Voter servers. There is no need to bootstrap Nonvoter and Staging servers.
However, it appears as though this method need only be called once, at the beginning of the cluster on only a single server, not all of the voting servers. In the cases where either:
- All of the servers are defined in the
raft.Configuration
, or - Only the first server is defined in the configuration and additional nodes are added using
func (*Raft) AddVoter
Any additional calls to BootstrapCluster
from any participating voter server cause the following error to occur:
bootstrap only works on new clusters
Any idea if this is the expected behaviour or erroneous documentation for this method?
Hi @sam3d !
You're right, this is definitely unclear documentation. However, I don't think it's wrong, though we should clarify it!
-
One way to bring up a cluster is to call
BootstrapCluster
on a single node (listing only that node in your configuration) and then have each of the other nodes join. Noted in the docs here. This approach corresponds to the Consul flag-bootstrap
. In this case, the singular node callingBootstrapCluster
will always be the leader. -
A second way to bring up a cluster is to initialize each of the voter nodes with the same configuration, and then have each of them call
BootstrapCluster
. Then, each of the nodes will attempt to self-elect, and only 1 of them will win. This means that the other nodes can safely ignore theerror
, as it's an expected outcome. The documentation doesn't note this, and should. This approach corresponds to the-bootstrap-expect
flag in Consul, wherein all the nodes in the cluster wait for Serf to register membership before each trying to Bootstrap themselves as the leader.
Thank you for bringing this to our attention! Modifying the docs to clarify the above would help to better align the docs with Consul and assist anyone using this library directly.
Would you be interested in submitting a PR for this change?
Thank you for response and a super helpful clarification, it’s really appreciated!
When I’m back at my laptop I’d be happy to put together a PR for this (apologies for the brevity I’m on mobile).
One final question: if only a single node is able to self elect as leader of a cluster (it’s the only node that calls the BootstrapCluster method, with five nodes in its configuration), and then after a while that node fails, are the other nodes still able to participate in elections even if they didn’t also call BootstrapCluster and ignore the error?
Yes, if an elected leader disappears, the other voting nodes with enter into leader election, and one of the voting nodes will be promoted to leader, regardless of whether they called BootstrapCluster
or not.
Thanks very much @sam3d :D
Thanks again for this!