HashDB

HashDB is a disk-based, highly efficient fixed-store Key-Value Store Database engineered for rapid data retrieval using hash-based indexing. It surpasses Redis, KeyDB, LMDB, RocksDB, SyllaDB, and likely all other alternatives out there! However, it does make certain trade-offs to achieve its performance.

Overview

Traditionally, databases use B/B+ trees for efficient data retrieval, which provides good speed and handles high volumes of data effectively. HashDB, on the other hand, utilizes hash-based indexing instead of traditional B-tree indexing. This approach sacrifices the ability to iterate over the dataset in sorted order but achieves extremely fast data iteration without regard to order.

Purpose

HashDB sacrifices features like sorted iteration and complex indexing (e.g., B/B+ trees) for O(1) lookup performance, making it ideal for scenarios where speed and simplicity in key-data data operations are paramount.

Features

O(1) Fast Lookup: Achieves O(1) lookup time for retrieving data, which is faster compared to traditional databases like LMDB or any Cassandra that typically operate with O(log(n)) lookup.
O(1) Fast Insert: Achieves O(1) insert time for retrieving data, which is faster compared to traditional databases that typically operate with O(log(n)) lookup.
Fixed Space: Each store in HashDB is limited to 2GB (e.g., accommodating up to 150 million entries of integer mapped to a 10-character string)
Forced User Serialization: Users define their serialization and deserialization methods to compact data based on user-defined classes implementing Key and Value interfaces.
Operations Supported: HashDB supports basic operations such as get, remove, and put.

Benchmarks (In server)

Benchmarks (Remote server)

File Structure: HashDB consists of the following files:
- INDEX: Stores hash values for rapid lookup.
- COLLISION: Handles collisions in the hash table.
- DATA: Stores the actual key-data data.
- COLLISION_BUBBLE: Manages deleted records to prevent file fragmentation.
- DATA_BUBBLE: Manages deleted data records similarly to COLLISION_BUBBLE.
- META: Stores end pointers for each file and key-data sizes.

Drawbacks

Iteration: Cant iterate over the data in sorted way or can do binary floor or ceil search on data
Index Size: Index size will be larger than traditional trees and needs to be in a seperate file.

Example Usage

import in.prasannathapa.db.HashDB;
import in.prasannathapa.db.data.IP;

public class Main {

  public static void main(String[] args) throws Exception {
    String dbName = "TestDB";
    HashDB<IP, IP> db = HashDB.createDB(IP.LENGTH, IP.LENGTH, 1000, dbName);

    IP ip1 = new IP("129.168.2.1");
    IP ip2 = new IP("129.168.2.2");

    db.put(ip1, new IP("99.99.29.19"));
    db.put(ip2, new IP("0.0.0.0"));
    HashDB.closeDB(dbName);

    db = HashDB.readFrom(dbName);
    System.out.println(IP.wrap(db.remove(ip2))); //0.0.0.0

    IP data = IP.wrap(db.get(ip1));
    System.out.println(data); //99.99.29.19

    data = IP.wrap(db.get(ip2));
    System.out.println(data); //Null

    HashDB.deleteDB(dbName);
  }
}

This example demonstrates how to create, populate, and read from a HashDB instance.

HashDB is not a production-grade database nor a general-purpose database. However, it excels in scenarios where fast data retrieval is a critical requirements, making it a compelling choice for specific applications.

snehasishroy/HashDB