/raft_fleet

A fleet of Raft consensus groups

Primary LanguageElixirMIT LicenseMIT

RaftFleet

Elixir library to run multiple Raft consensus groups in a cluster of ErlangVMs

Hex.pm Build Status Coverage Status

Feature & Design

  • Easy hosting of multiple "cluster-wide state"s
  • Reasonably scalable placement of processes for multiple Raft consensus groups
    • consensus member processes are distributed to ErlangVMs in a data center-aware manner using rendezvous hashing
    • automatic rebalancing on adding/removing nodes
  • Each consensus group leader is accessible using the name of the consensus group (which must be an atom)
    • Actual pids of consensus leader processes are cached in a local ETS table for fast access
  • Flexible data model (defined by rafted_value)
  • Decentralized architecture and fault tolerance

Notes on backward compatibility

  • Users of <= 0.6.0 should upgrade to 0.6.1 before upgrading to 0.7.x due to a change in internal data structure. While <= 0.6.0 and 0.7.x are not compatible, 0.6.1 should be able to interact with both <= 0.6.0 and 0.7.x.

Example

Suppose we have a cluster of 4 erlang nodes:

$ iex --sname 1 -S mix
iex(1@skirino-Manjaro)>

$ iex --sname 2 -S mix
iex(2@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 3 -S mix
iex(3@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 4 -S mix
iex(4@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

Load the following module that implements RaftedValue.Data behaviour on all nodes in the cluster.

defmodule JustAnInt do
  @behaviour RaftedValue.Data
  def new(), do: 0
  def command(i, {:set, j}), do: {i, j    }
  def command(i, :inc     ), do: {i, i + 1}
  def query(i, :get), do: i
end

Call RaftFleet.activate/1 on all nodes.

iex(1@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(2@skirino-Manjaro)> RaftFleet.activate("zone2")

iex(3@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(4@skirino-Manjaro)> RaftFleet.activate("zone2")

Create 5 consensus groups each of which replicates an integer and has 3 consensus members.

iex(1@skirino-Manjaro)> rv_config = RaftedValue.make_config(JustAnInt)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus1, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus2, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus3, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus4, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus5, 3, rv_config)

Now we can run query/command from any node in the cluster:

iex(1@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 0}

iex(2@skirino-Manjaro)> RaftFleet.command(:consensus1, :inc)
{:ok, 0}

iex(3@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 1}

Activating/deactivating a node in the cluster triggers rebalancing of consensus member processes.

Deployment notes

To run raft_fleet within an ErlangVM cluster, the followings are our general recommendations.

Cluster should consist of at least 3 nodes to tolerate 1 node failure. Similarly cluster nodes should span 3 (or more) data centers, so that the system keeps on functioning in the face of 1 data center failure.

When you add new ErlangVM nodes, each node should run the following initialization steps:

  1. establish connections to other running nodes,
  2. call RaftFleet.activate/1.

These steps are typically done within start/2 of the main OTP application. Information of other running nodes should be available from e.g. IaaS API.

When terminating a node you should proceed as follows (although raft_fleet tolerates failures that don't break quorums, it's much better to tell raft_fleet to make preparations beforehand):

  1. call RaftFleet.deactivate/0 within the node-to-be-terminated,
  2. wait for a while (say, 10 min) so that existing consensus group members are migrated to the other nodes, then
  3. finally shutdown the node.

Links