/d-rocksdb

D-RocksDB disaggregate the memory of block cache in RocksDB, but seems not quite useful with my implementation.

Primary LanguageC++

Disaggregated RocksDB

This disaggregation in block cache of RocksDB seems actually not quite useful given the fast SSD.

But you may find the RDMA-related codes useful if you are a beginner in RDMA.

TODO

D-RocksDB

Intuitive Version

  • 1. a simple remote memory allocator
  • 2. copy and tidy up the LRUCache codes and create RMLRUCache, remove Secondary Cache Logics for convenient
  • 3. a simple rdma server or interface for convenient fetching and storing operations.
    • a server handling control message and cm events
    • a client that has one qp to write/read remote memory
    • unit test above
  • 4. embed the rdma interface into rocksdb
  • 5. implement the remote memory logic for LRUCache.
    • LRUHandle. modify its fields to support below operations
    • rm_lru. implement rm_lru related methods (the simplest lru)
    • eviction.
      • evict local block to remote if exceeding local memory
      • evict remtote block if exceeding total memory
        • shard remote memory so that any shard can control its own rm (otherwise, it may fail when allocate a space but memory is framented by other shards)
    • fetch if remote
    • statistics about the remote memory
      • count of hit in rm/hit in lm
      • time of hit in rm/hit in lm (or rm overhead)
      • stats of lm/rm usage (through GetMapProperty)

Improved Version v1.1.0

  • try to treat remote memory as blocks, i.e., only allocate a block for each cache block (here is an assumption that the block size can always be fit in a cache)

Improved Version v2.0.0

  • support async read/write
    • modify rdma_transport to support async read/write (ignore potential race)
    • modify rm to support async ops
      • AsyncRequest with a buffer to recv remote value or a pointer to buffer that will be sent
    • modify DLRUCache to use async ops
      • do the transfering out of mutex
      • invoke wait upon using the DLRUHandle (e.g., Lookup), and do free if necessary.
  • modify rm to support rdma_transport pool (avoid contention)
  • overlap rdma read/write as much as possible
    • overlap read/write exchange in Lookup

Improved Version v3.0.0

  • local BlockBasedMemoryAllocator
    • a basic usable allocator (with custom deleter)
    • shard the memory region to avoid lock contention
  • register local BlockBasedMemoryAllocator for RDMA
  • directly read/write to avoid copy
    • sync version
    • async version

YCSB

  • support configuration of using d_lru_cache or normal lru_cache
  • support configuration of using rm_ratio
  • modify value generator to use transformation of key, for easier verification of the correctness.