/Innopolis-DS-F19-Project

[F19] Distributed Systems - Course Project

Primary LanguageJava

[F19] Distributed Systems - Course Project

The Distributed File System has been written as a project for Innopolis University Distributed Systems course, Fall 2019.

Installation

Naming server

  1. Run a server instance and install docker-compose on it.
  2. Clone docker compose file from GitHub repository.
  3. docker-compose up

Storage servers

  1. Run required number of instances and install docker-compose on them.
  2. Clone docker compose file from GitHub repository.
  3. In docker-compose.yml file in command line specify private address of naming server and private own address of storage server correspondingly.
  4. docker-compose up

Client

Docker way

  1. Download docker-compose.yml for Client
  2. Run docker-compose up -d in directory of download
  3. Run docker ps to get the container ID
  4. Run docker exec -ti <container ID> /bin/bash
  5. Inside the container run ./Client <naming server IP>
  6. Here you go.

JAR way

  1. Download archive
  2. Extract JAR
  3. Run java -jar Client.jar <naming server IP>
  4. Here you go.

Specification

Functionality

File system's users should be able to perform certain operations on files and directories. On files: upload, download, create, remove copy, move, get info. On directories: create, remove, list.

All the files are meant to be replicated on multiple storage servers such that the DFS become fault-tolerant: when a storage node fails or offline, the data is accessable on other storages that keep the file replica.

High-level System Diagram

The system consists of Client, Naming Server and Multiple Storage Servers.

Naming Server keeps the filesTree - structure of directories and files, metadata of files. It assigns IP to each file by which it can be accessed by Storages or Client. Also Naming Server keeps IPs of each running Storage Server.

Storage Server keeps files. From time to time (each 5 seconds) it pings the Naming Server with a heartbeat. If there is no a heartbeat within 10 seconds, Naming Server removes the Storage from runnig Storages list.

Client is a console application that allows to perform a set of operations on files and directories.

Client commands

  1. init - clear all

  2. touch <new_file_path> - create empty file

  3. get <remote path> (local path) - download file

  4. put <local path> (remote path) - upload file

  5. rm <path> - delete file

  6. info <path> - file info

  7. cp <from> <to> - copy file

  8. mv <from> <to> - move file

  9. cd <path> - open directory

  10. ls (path) - read directory

  11. mkdir (path) - create directory

  12. help - list available commands

  13. setdd <path> - set a directory for downloads

  14. getdd - get current directory for downloads

<arg> - required argument, (arg) - optional argument

Storage-Naming-Client commands implementation

init

Possible Acknowledgements: OK

touch

Possible Acknowledgements: OK, INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST, FILE_OR_DIRECTORY_ALREADY_EXISTS

get

  • storageIp - where the Client should download the file
  • fileId - globally unique file ID
Possible Response codes: OK, TOUCHED,INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST, NO_NODES_AVAILABLE

put

  • storageIP - where the Client should download the file
  • fileId - globally unique file ID
  • [replicasIPs] - IPs of Storages to which the file should be replicated

At step(4) after uploading the file to primary storage, it aknowledges Client and Naming Server. Right after that, Storage starts sending file to replicated storages - after finishing of uploading each notifies the Naming Server

Possible Response codes: OK, NO_NODES_AVAILABLE, INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST, FILE_OR_DIRECTORY_ALREADY_EXISTS 

rm

Possible Acknowledgements: OK,CONFIRMATION_REQUIRED, FILE_OR_DIRECTORY_DOES_NOT_EXIST 

Each Storage pings Naming server with a heartbit with period of 5 seconds. Every 6th heartbeat is fetchFile request. As a response Naming server sends the list of fileIPs that the storage should keep (thus, all the others excess files are removed). Also, Naming server sends a list of tuples {fileIP, storageIP} - certain files from corresponding storages should be requested by the Storage to be downloaded (handles cases of replication failure while uploading from client).

info

Possible Acknowledgements: OK, INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST

cp

Naming server keeps structures of unique fileIDs and Storage paths by which the file could be reached. Thus, when copying, Naming Server just adds a new path to the corresponding file list.

Possible Acknowledgements: OK, INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST, FILE_OR_DIRECTORY_ALREADY_EXISTS 

mv

Naming server keeps structures of unique fileIDs and Storage paths by which the file could be reached. Thus, when moving, Naming Serverat first adds a new path to the corresponding file list, and then deletes the fromPath from the corresponding file list.

Possible Acknowledgements: OK, INCORRECT_NAME, FILE_OR_DIRECTORY_DOES_NOT_EXIST, FILE_OR_DIRECTORY_ALREADY_EXISTS

ls

Possible Acknowledgements: OK, FILE_OR_DIRECTORY_DOES_NOT_EXIST

cd

If the requested path exists in NAming Server fileTree, it notifies Client with either success or fail. Current directory is displayed in console.

Possible Acknowledgements: OK, FILE_OR_DIRECTORY_DOES_NOT_EXIST

mkdir

Possible Acknowledgements: OK, FILE_OR_DIRECTORY_DOES_NOT_EXIST, FILE_OR_DIRECTORY_ALREADY_EXISTS

Communication protocols

In our project custom protocols based on Socket connection were used being used. Aknowledgements are being sent as Objects within ObjectOutputStream. Those objects contains Response Codes and some optional field depending in the type of command.

While file downloading/uploading (get/put) Client and Storage Server use the following protocol alike TCP:

Response Codes

Code Meaning Commands
OK successful execution all
NO_NODES_AVAILABLE no nodes available for uploading a file put, get
INCORRECT_NAME requested path is "/" put, info, touch, get, cp, mv
FILE_OR_DIRECTORY_DOES_NOT_EXIST no requested path exists put, info, get, cp, mv, cd, ls, mkdir
FILE_OR_DIRECTORY_ALREADY_EXISTS requested(for creation) path is already exist put, cp, mv, mkdir, rm
TOUCHED everything is OK, but the requested file has no content get
CONFIRMATION_REQUIRED procedure requires confirmation for continuing rm

Implementation details

Technologies

  • Java
  • Docker
  • Gradle

Continuous integration

CI is done for the project!

Push to github -> auto gradle build -> push to dockerhub

Team

  • Elena Lukyanchikova, B17-SE-01 -- (client, documentation)
  • Rim Rakhimov, B17-SB -- (naming server, deployment)
  • Ruslan Shakirov, B17-SE-01 -- (storage server, deployment)