Implementation of Raft algorithm: Design ToC: Introduction -- quick description of raft Behavior of nodes + structs required to represent state of the node Leader Election How replicated log is handled ############################################ ############################################ Things to be done: - Read through and complete description of behaviour of servers in cluster - Describe structs and methods to implement said behaviour - Write steps for implementation... desribe steps as a git commit ############################################ ############################################ Introduction: Raft is a consensus algorithm. Raft achieves consensus by first electing a leader, then have the leader be responsible for managing the replicated log. Once a leader has been elected, all nodes will forward log entry requests to the leader. The leader will determine the order of the log entries and inform the servers when it is safe to store an entry. If the leader ceases action, the node that notices the leader has died will trigger an election. Raft can be decomposed into 3 major subproblems: Leader election - determining when a leader needs to be elected and then the actually electing of a new leader Log replication - How the leader receives and handles log entry requests Safety - Nodes in the raft cluster cannot have different entries for the same log index Raft guarantees: - At most there will be one leader per term - A leader cannot change old log entries, only add new ones - Entries from the same index and term will be identical - If a log is committed by the leader, subsequent leaders must have that log entry - If a node has applied an entry of a given index to its state machine no other server will apply a different entry for that index The leader node does the following things: - receive requests from nodes - send notifications that log entries should be committed - send heartbeat so nodes know the leader is alive In this implementation log requests will send an rpc message to a node with a string to be stored. For now the log entries will be stored in memory rather than written to persistent storage. Node behaviour: A node can have three states: follower, candidate, leader. A node will behave differently depending on its state. Follower behaviour: - Listens for log entry requests and messages from the leader - Forwards log entry requests to leader - Appends log entries when it receives commit messages from the leader - If it ceases to receive messages from the leader a follower node will turn itself into a candidate and send voteRequest messages to all nodes in the cluster. - Nodes must always track cluster membership - In the case of an election a follower node will vote for the node from which it received the voteRequest rpc, unless the sending node is behind. If the sending node is behind, the receiver will declare itself a candidate, vote for itself and broadcast voteRequest rpcs to all members of the cluster - Followers will respond to leader heartbeats to indicate they are alive and to tell the leader their highest committed index. This allows the leader to update its matchIndex list // leader heartbeat func (r *RaftNode) handleAppendEntriesRequest (AEproto) {} // if follower has an entry that conflicts with new entries from leader (same index // different term) delete existing entries and follow the leader // append any new entries to the log // if leaderCommit > followerCommit set followerCommit to min(leaderCommit, // highest commit index in the request) // send ACK to leader so leader knows node is alive func (r *RaftNode) handleVoteRequest (VRproto) { // check if requester term and commitIndex are > r.term and r.commitIndex // if yes reply voteGranted + update term and candidateId else reply // voteGranted=False } func (r *RaftNode) logEntryRequest () { if follower: fwd to leader elif leader: update log and forward entry with next append entries msg } Follower rpc messages: - leader heartbeat (appendEntries) - voteRequest - if another node triggers an election the follower may receive a voteReq rpc will include highest commit index of sender, follower responds to voteReq with another voteReq with non-nil granted field - addNode - node joined cluster add to list of members Candidate behaviour: - Sends voteRequest rpc to every node in the cluster and waits for responses - Tallies votes and declares itself leader if it achieves a quorum - Sends leader heartbeat to declare itself leader Candidate rpc messages: - voteRequest - voteResponse (receive) - leader heartbeat (on election victory) Leader behaviour: - sends heartbeat (appendEntries) to all cluster members, heartbeat msg is repeated during idle periods to prevent the timeouts that would trigger a needless election - receives requests from all cluster members (join, log entry) - if leader receive logRequest from client it appends that entry to its local log and responds to requesting node after entry is applied to state machine - leader tracks nextIndex and matchIndex for each node in the cluster nextIndex is the next unfilled index on a node - sends new member messages - sends remove member message Leader rpc messages: - leader heartbeat - commitEntry - join request - log entry request RaftNode Struct: The RaftNode struct will contain all information the node knows about the cluster as well as the node's log. RaftNode will contain the following: - clusterMembers - list of members addresses - state - node's state (candidate or follower or leader) - currentTerm - number of terms node has been alive -- a new term is triggered with each leader election - votedFor - candidateId (addr) this node voted for in current term - log - slice containing all log entries this node has committed - commitIndex - highest committed log entry index - lastApplied - index of highest log entry applied to state machine (may not yet be committed) enum state { follower candidate leader } type RaftNode struct { leader string // addr of current leader of the cluster members []string // contains member addresses state enum.Type // follower etc. currentTerm int votedFor string // id of candidate voted for in this term log []*logEntry // slice containing log entries commitIndex int // index in log known to be committed lastApplied int // highest index applied to state machine } func (r *RaftNode) applyToStateMachine(entry *logEntry) { // apply entry to state machine r.lastApplied += 1 } func (r *RaftNode) commit(entry *logEntry) { // commit entry (whatever that means) r.commitIndex += 1 } LeaderNode Struct: When a RaftNode changes state to leader it will create a LeaderNode struct with the following information: - nextIndex - index of next log entry to be sent to each node in the cluster - for each member this value is initialized to n+1 where n is the leaders highest committed index - matchIndex - for each member, lists highest index known to be committed List of RPCs: RequestVote: term, candidateId (address of node requesting vote), lastLogIndex (index of candidates highest commit), lastLogTerm (term of candidates last log entry), term (for candidate to set), voteGranted (set by receiver) - receiver sets voteGranted to false if term (of candidate) < node.CurrentTerm - set voteGranted true if candidates log is >= receivers - if voteGranted field is nil then the receiver must respond to this request - if voteGranted is !nil then it is a response Handling RPC messages and responses: All nodes must be able to listen for and handle multiple message types. Rather than truly have multiple messages there will be one rpc type: RaftRPC that will include an interface. This allows RaftRPC to wrap the other RPC messages. Once the message is received the response will be handled based on the type inferred from the message contents. - for now (until we handle client requests) followers act as client for other node's heartbeats and voteRequests - followers should listen on some port for incoming connections and respond to rpcs it receives when it is connected to LeaderHeartbeat / Append new entries: leader's term, leader addr, log index preceding the new ones, prevLogTerm: term associated with prevLogIndex entry, a list of entries: entries []*logEntry, leaderCommit: leader's highest committed index - follower responds by setting term and success in response - follower responsed false if its currentTerm is > term - if follower has an entry that conflicts with new entries from leader (same index different term) delete existing entries and follow the leader - append any new entries to the log - if leaderCommit > followerCommit set followerCommit to min(leaderCommit, highest commit index in the request) Step 1: Implementing Leader Election Implement structure of Raft: - implement node structs - right RPC messages - provide a way for nodes to communicate based on their config - implement leader election - write harness that will launch a raft cluster based on a config - write leader election integration tests What it should do: - a node should be able to run the raft service - given a 'well known address' the node should be able to join a 'cluster' for now the assumption will be that the node with the WKN doesn't go down - nodes should be able to elect a leader - log entry requests don't yet matter; all that matters is that a leader is elected - the leader will send heartbeats to ensure timeouts work properly, then the test will kill the leader and verify that an election is triggered - 'nodes' will probably be docker containers running raft that are bound to a particular port on localhost - failure modes to be handled: leader dies and election is triggered - handled in the next stage: leader misses requests and falls behind; leader becomes unresponsive but comes back up later