hashicorp/raft

Potentially misleading documentation in BootstrapCluster method

Closed this issue · 4 comments

sam3d commented

I'm observing different behaviour than the documented behaviour for func (*Raft) BootstrapCluster. The documentation for this method says the following:

BootstrapCluster is equivalent to non-member BootstrapCluster but can be called on an un-bootstrapped Raft instance after it has been created. This should only be called at the beginning of time for the cluster, and you absolutely must make sure that you call it with the same configuration on all the Voter servers. There is no need to bootstrap Nonvoter and Staging servers.

However, it appears as though this method need only be called once, at the beginning of the cluster on only a single server, not all of the voting servers. In the cases where either:

  1. All of the servers are defined in the raft.Configuration, or
  2. Only the first server is defined in the configuration and additional nodes are added using func (*Raft) AddVoter

Any additional calls to BootstrapCluster from any participating voter server cause the following error to occur:

bootstrap only works on new clusters

Any idea if this is the expected behaviour or erroneous documentation for this method?

Hi @sam3d !

You're right, this is definitely unclear documentation. However, I don't think it's wrong, though we should clarify it!

  • One way to bring up a cluster is to call BootstrapCluster on a single node (listing only that node in your configuration) and then have each of the other nodes join. Noted in the docs here. This approach corresponds to the Consul flag -bootstrap. In this case, the singular node calling BootstrapCluster will always be the leader.

  • A second way to bring up a cluster is to initialize each of the voter nodes with the same configuration, and then have each of them call BootstrapCluster. Then, each of the nodes will attempt to self-elect, and only 1 of them will win. This means that the other nodes can safely ignore the error, as it's an expected outcome. The documentation doesn't note this, and should. This approach corresponds to the -bootstrap-expect flag in Consul, wherein all the nodes in the cluster wait for Serf to register membership before each trying to Bootstrap themselves as the leader.

Thank you for bringing this to our attention! Modifying the docs to clarify the above would help to better align the docs with Consul and assist anyone using this library directly.

Would you be interested in submitting a PR for this change?

sam3d commented

Thank you for response and a super helpful clarification, it’s really appreciated!

When I’m back at my laptop I’d be happy to put together a PR for this (apologies for the brevity I’m on mobile).

One final question: if only a single node is able to self elect as leader of a cluster (it’s the only node that calls the BootstrapCluster method, with five nodes in its configuration), and then after a while that node fails, are the other nodes still able to participate in elections even if they didn’t also call BootstrapCluster and ignore the error?

Yes, if an elected leader disappears, the other voting nodes with enter into leader election, and one of the voting nodes will be promoted to leader, regardless of whether they called BootstrapCluster or not.

Thanks very much @sam3d :D

Thanks again for this!