superfly/litefs

Cluster ID

benbjohnson opened this issue · 0 comments

Currently, LiteFS uses a loose cluster membership and nodes simply accept the state of whichever node is primary. This makes it easy to get up and running, however, this can be problem if a user accidentally connects two existing clusters together. In this case, only one node will become primary and it will cause the other nodes in both clusters to sync to its state, thus losing data from one cluster.

Cluster ID generation

To prevent this, we suggest adding a randomly generated Cluster ID to LiteFS. This ID will be automatically generated when a node moves to its "ready" state:

  1. After a node becomes primary, or
  2. After a node connects to the primary and performs a sync.

The cluster ID will be generated once and saved to a clusterid file in the root data directory of LiteFS.

For Consul-based leases, the clusterid should be saved long-term to "${lease.consul.key}/clusterid" if it is not set. Any nodes attempting to become primary should check this key to ensure that it is not attempting to become primary of a different cluster.

Preventing Conflicts

Any time a node connects to another node (e.g. POST /stream), the replica will set a Litefs-Cluster-Id request header (if available) and the primary will set a Litefs-Cluster-Id response header. The primary & replica should reject requests/responses from differing cluster IDs. If the replica does not have a cluster ID set, it should adopt the cluster ID of the primary.