/SDPrac2

Primary LanguagePython

Sharded Key-value Storage System

Distributed Systems course: Assignment 2 - Universitat Rovira i Virgili


Table of Contents

  1. Introduction
  2. System Features and Goals
  3. System Implementation
  4. Real Examples of Sharded Storage Systems
  5. Fault Tolerance and Replica Group Management
  6. Conclusion
  7. References
  8. Installation
  9. Evaluation
  10. Authors

Introduction

The Sharded Storage System is a scalable and fault-tolerant solution designed to store and access large volumes of data efficiently. It utilizes sharding techniques to partition data into smaller subsets called shards, which can be distributed across multiple servers. The system aims to achieve high performance, horizontal scalability, and fault tolerance.

System Features and Goals

The key features and goals of the Sharded Storage System are as follows:

  • Scalability: The system should be able to handle increasing data volumes and request loads by scaling both vertically and horizontally.
  • Efficient Data Distribution: Data should be evenly distributed across shards and servers to optimize data access and avoid hotspots.
  • Fault Tolerance: The system should be resilient to server failures and ensure high availability of data and operations.
  • Automatic Failure Detection: The system should detect server failures automatically to maintain data consistency and availability.
  • Dynamic Replica Group Management: In the event of a replica master failure, the system should be able to elect a new master among the remaining replicas.

System Implementation

Vertical and Horizontal Scalability

The Sharded Storage System is designed to horizontally scalable.

  • Horizontal Scalability: The system achieves horizontal scalability by distributing data across multiple servers. It makes use of a hashing algorithm to determine shard placement. Each server has a key range, which is stored using a tuple to indicate min and max value saved. By adding more servers to the system, the data and request loads can be distributed across multiple machines, ensuring scalability.

Sharding Technique

Sharding is a technique used to partition data into smaller subsets called shards. Each shard contains a subset of the data, and multiple shards collectively hold the entire dataset.

  • Query Routing: When a client sends a query, the system uses the shard key to determine the target shard(s) for the query. By knowing which shard holds the relevant data, the query is routed directly to the appropriate shard(s), minimizing the need for scanning the entire dataset.

Fault Tolerance

The Sharded Storage System incorporates fault tolerance mechanisms to ensure high availability and data consistency.

  • Replication: Each shard in the system maintains multiple replicas to provide redundancy. Replicas are copies of the shard's data stored on different servers. If a server hosting a replica fails, the system can rely on other replicas to continue serving data and processing requests.

  • Automatic Failure Detection: To detect server failures, the system uses heartbeat messages or monitoring processes. Each server periodically sends heartbeat messages to its replicas or a monitoring process. If a heartbeat is not received within a certain time period, the server is considered failed, and appropriate actions are taken.

  • Master Election: In the event of a replica master failure, the remaining replicas participate in an election process to select a new replica master. Various algorithms can be used, such as the Raft algorithm or the Paxos algorithm, to achieve consensus among the replicas and elect a new master. The elected replica takes over the responsibilities of the failed master.

Real Examples of Sharded Storage Systems

Example 1: MongoDB

MongoDB is a popular document-oriented database that utilizes sharding for scalability and performance.

Architecture Diagram:

+-----------------+         +---------------------+
|                 |         |                     |
|   MongoDB       |         |  MongoDB            |
|   Query Router  +---------> Shard 1             |
|   (mongos)      |         |                     |
|                 |         +---------------------+
|                 |
|                 |         +---------------------+
|                 |         |                     |
|                 +---------> Shard 2             |
|                 |         |                     |
|                 |         +---------------------+
|                 |
|                 |         +---------------------+
|                 |         |                     |
|                 +---------> Shard 3             |
|                 |         |                     |
|                 |         +---------------------+
+-----------------+

In MongoDB, sharding is achieved by dividing data into chunks based on a shard key. Each shard is a separate MongoDB replica set that contains a subset of the data. The system uses a configuration server to track metadata about the sharded data, including chunk ranges and shard mappings. The query router, known as the mongos process, receives queries from clients and routes them to the appropriate shards based on the shard key.

Example 2: Apache Cassandra

Apache Cassandra is a distributed and highly scalable NoSQL database that utilizes a decentralized peer-to-peer architecture.

Architecture Diagram:

+-----------------+
|                 |
|   Apache        |
|   Cassandra     |
|                 |
|                 |
|    +-----------+-------------+
|    |           |             |
|    |   Node 1  |   Node 2    |
|    |           |             |
|    +-----------+-------------+
|                 |
|    +-----------+-------------+
|    |           |             |
|    |   Node 3  |   Node 4    |
|    |           |             |
|    +-----------+-------------+
|                 |
|    +-----------+-------------+
|    |           |             |
|    |   Node 5  |   Node 6    |
|    |           |             |
|    +-----------+-------------+
|                 |
+-----------------+

In Cassandra, data is distributed across multiple nodes in a ring-based architecture. Each node in the cluster is responsible for a portion of the data based on a partition key. The system uses a gossip protocol for failure detection and membership management. The replicas are distributed across multiple nodes using a replication strategy, such as SimpleStrategy or NetworkTopologyStrategy, to ensure fault tolerance and data redundancy.

Fault Tolerance and Replica Group Management

For fault tolerance and replica group management, the Sharded Storage System implements the following strategies:

  • Automatic Failure Detection: The system regularly sends heartbeat messages or monitors the health of each server in the replica group. If a server fails to respond within a specified time, it is considered failed, and appropriate actions are taken.

  • Master Election: In the event of a replica master failure, the remaining replicas participate in an election process to select a new replica master. Consensus algorithms such as Raft or Paxos can be used to ensure agreement among the replicas. Once a new master is elected, it assumes the responsibilities of the failed master.

Conclusion

The Sharded Storage System is designed to provide scalability, fault tolerance, and high performance for storing and accessing large volumes of data. By utilizing sharding techniques, the system distributes data across multiple servers, allowing for both vertical and horizontal scalability. Fault tolerance mechanisms, including replication and automatic failure detection, ensure high availability and data consistency. Real-world examples such as MongoDB and Apache Cassandra demonstrate the practical implementations of sharded storage systems.

By addressing the system's features and goals, explaining the design choices, and discussing fault tolerance mechanisms, this document provides an overview of the Sharded Storage System's architecture and functionality.

References

  1. Redis
  2. MongoBD
  3. Apache Cassandra

Installation

· Linux

python3 -m pip install -r requirements.txt
python3 -m grpc_tools.protoc --proto_path=. --grpc_python_out=. --pyi_out=. --python_out=. ./KVStore/protos/*.proto
python3 -m pip install -e .

· Windows

py -m pip install -r requirements.txt
py -m grpc_tools.protoc --proto_path=. --grpc_python_out=. --pyi_out=. --python_out=. ./KVStore/protos/*.proto
py -m pip install -e .

Evaluation

First subtask (simple KV storage)

python3 eval/single_node_storage.py

Second subtask (sharded KV storage)

python3 eval/sharded.py

Third subtask (sharded KV storage with replica groups)

python3 eval/replicas.py

Authors