/cs733

Primary LanguageC++

#Distributed File System using RAFT

Given is a simple implementation of a distributed file system which makes use of the RAFT consensus protocol in order to distribute the file system across a cluster of servers (5 in this case) and ensure that each of the servers agrees on the series of actions to be performed locally in their state machines. It is comprised of the following parts

fs - A simple network file server

fs is a simple network file server. Access to the server is via a simple telnet compatible API. Each file has a version number, and the server keeps the latest version. There are four commands, to read, write, compare-and-swap and delete the file.

fs files have an optional expiry time attached to them. In combination with the cas command, this facility can be used as a coordination service, much like Zookeeper.

Sample Usage

> go run server.go & 

> telnet localhost 8080
  Connected to localhost.
  Escape character is '^]'
  read foo
  ERR_FILE_NOT_FOUND
  write foo 6
  abcdef
  OK 1
  read foo
  CONTENTS 1 6 0
  abcdef
  cas foo 1 7 0
  ghijklm
  OK 2
  read foo
  CONTENTS 2 7 0
  ghijklm

Command Specification

The format for each of the four commands is shown below,

Command Success Response Error Response
read filename \r\n CONTENTS version numbytes exptime remaining\r\n
content bytes\r\n
ERR_FILE_NOT_FOUND
write filename numbytes [exptime]\r\n
content bytes\r\n
OK version\r\n
cas filename version numbytes [exptime]\r\n
content bytes\r\n
OK version\r\n ERR_VERSION newversion
delete filename \r\n OK\r\n ERR_FILE_NOT_FOUND

In addition the to the semantic error responses in the table above, all commands can get two additional errors. ERR_CMD_ERR is returned on a malformed command, ERR_INTERNAL on, well, internal errors. There is an additional error ERR_REDIRECT <url> that the client receives in case it is sending the commands to the node which is not the leader.

For write and cas and in the response to the read command, the content bytes is on a separate line. The length is given by numbytes in the first line.

Files can have an optional expiry time, exptime, expressed in seconds. A subsequent cas or write cancels an earlier expiry time, and imposes the new time. By default, exptime is 0, which represents no expiry.

raft - A consensus protocol

raft is the layer which is responsible for command replication over each of the file servers in the cluster, and to ensure their consistency. It ensures that the command is handed over to the state machine of a given file server instance of the cluster, only if it has been successfully replicated over a majority of the servers in the cluster.

This coordination is achieved by the leader node, which is one of the nodes in the cluster, which has been voted for by a majority in the cluster.

Input Events

The format for each of the input events for the raft state machine is shown below,

Event Success Response Error Response
Append (data) Commit (node id, index, term, data, nil) Commit (node id, -1, 0, data, error message)
Timeout () Alarm (timeout)
AppendEntriesRequest (term, logterm, leaderId, prevLogIndex, prevLogTerm, data, leaderCommit) AppendEntriesResponseEvent (id, index, term, data, true) AppendEntriesResponseEvent (id, index, term, data, false)
VoteRequestEvent (candidateId, term, logIndex, logTerm) VoteResponseEvent(id, term, true) VoteRequestEvent(id, term, false)

Output Actions

The format of the output actions generated by the raft state machine is given below,

Action Description
Send (peerId, event) Sends the event i.e AppendEntries Request/Response or Vote Resquest/Response to the given peerId
Commit (index, data, err) Sends index+data if commit is successful, else sends data+err to report an error
Alarm (time) Sends a Timeout event after time milliseconds
LogStore (index, term, data) Indication to append data to the log at the given index
SaveState (curTerm, votedFor, commitIndex) Stores the given data i.e. curTerm, votedFor, commitIndex to persistent storage for recovery

Install

go get github.com/SrishT/assignment4
go test github.com/SrishT/assignment4/...