/bob

Distributed BLOB storage

Primary LanguageRustMIT LicenseMIT

Bob

build Docker Pulls

Bob

Bob is a distributed storage system designed for byte data such as photos. It has decentralized architecture where each node can handle user calls. Pearl is used as a backend storage.

More information can be found on wiki.

Installation

For Linux x86_64, you can use RPM or DEB packages from the releases page. There is also a ZIP archive with compiled x64 binaries.

Docker images can be found on DockerHub qoollo/bob.

Additionally, you can build Bob from sources. Instruction can be found here.

API

Bob uses gRPC API with the following proto file: bob.proto. This means that you can automatically generate a protocol implementation for any platform. Detailed gRPC API description can be found on wiki.

There is also a rich client for .NET: https://github.com/qoollo/bob-client-net.

Tools

You can use bobc and bobp to access data on a cluster. brt can be used to check and recover corrupted BLOBs.

There are also tools for removing old partitions, recovering aliens, expanding cluster and more, that can be found in bob-tools repository.

Performance

Bob can handle more than 10k RPS. Detailed tests and hardware configuration can be found on wiki.

Architecture

cluster.yaml file describes the addresses of all nodes in the cluster and a set of directories/physical disks on each node. All data is logically distributed across virtual disks (vdisks). You should create a cluster with more virtual disks than physical ones (it would be reasonable to place 3..10 vdisks on 1 physical disk). Destination vdisk determs like this: vdisk_id = data_id % vdisks_count. Cluster writes data to all nodes that contain target vdisk.

Example config with 2 nodes and 3 vdisks:

- nodes:
  - name: node1
    address: 127.0.0.1:20000
    disks:
      - name: disk1
        path: /tmp/d1
  - name: node2
    address: 127.0.0.1:20001
    disks:
      - name: disk1
        path: /tmp/d1
      - name: disk2
        path: /tmp/d2

- vdisks:
  - id: 0
    replicas:
      - node: node1
        disk: disk1
  - id: 1
    replicas:
      - node: node2
        disk: disk1
  - id: 2
    replicas:
      - node: node1
        disk: disk1
      - node: node2
        disk: disk2

As you can see, there are no replicated nodes, instead Bob replicates vdisks. Vdisks are distributed across nodes, which provides more flexibility. Some nodes with more powerful hardware can have more vdisks and therefore store more data.

cluster.yaml can be generated semi-automatically with CCG.

Backend

Backend store data in 2 groups: "normal" and "alien" folders. "Normal" are vdisk folders described in cluster config . "Alien" is folder for data that cannot be written to its node due to node unavailability. It has the same structure that "normal" folder but also has "node name" in folder hierarchy. Under vdisk info in folder it has timestamp folder info.

- disk 1
    - vdisk id 1
...
    - vdisk id N
        - timestamp 1
        ...
        - timestamp N
            - blob 1
            ...
            - blob N
...
- disk N
    - Alien
        - node name
            - vdisk id 1
                - timestamp 1
                    - blob 1