/dash

Scalable Hashing on Persistent Memory

Primary LanguageC++MIT LicenseMIT

Dash: Scalable Hashing on Persistent Memory

Persistent memory friendly hashing index, to appear at VLDB 2020.

More details are described in our VLDB paper below. If you use our work, please cite:

Baotong Lu, Xiangpeng Hao, Tianzheng Wang, Eric Lo:
Dash: Scalable Hashing on Persistent Memory. 
PVLDB 13(8): 1147-1161 (2020)

What's included

  • Dash EH - Proposed Dash extendible hashing
  • Dash LH - Proposed Dash linear Hashing
  • CCEH - PMDK patched CCEH variant used in our benchmark
  • Level Hashing - PMDK patched level hashing variant used in our benchmark
  • Mini benchmark framework
  • Example program - how to integrate Dash to your application

Fully open-sourced under MIT license.

Building

Dependencies

We tested our build with Linux Kernel 5.5.3 and GCC 9.2. You must ensure that your Linux kernel version >= 4.17 and glibc >=2.29 since we use MAP_FIXED_NOREPLACE in our customized PMDK.

The external dependencies are our customized PMDK and epoch manager, which are also open-sourced.

Compiling

Assuming to compile under a build directory:

git clone https://github.com/baotonglu/dash.git
cd dash
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_PMEM=ON .. 
make -j

Running benchmark

As stated in our paper, we run the tests in a single NUMA node with 24 physical CPU cores. We pin threads to physical cores compactly assuming thread ID == core ID (e.g., for a dual-socket system, we assume cores 0-23 are located in socket 0, and cores 24-47 in socket 1). To run benchmarks, use the test_pmem executable in the build directory. It supports the following arguments:

./build/test_pmem --helpshort
Usage: 
    ./build/test_pmem [OPTION...]

-index      the index to evaluate:dash-ex/dash-lh/cceh/level (default: "dash-ex")
-op         the type of operation to execute:insert/pos/neg/delete/mixed (default: "full")
-n          the number of warm-up workload (default: 0)
-p          the number of operations(insert/search/delete) to execute (default: 20000000)
-t          the number of concurrent threads (default: 1)
-r          search ratio for mixed workload: 0.0~1.0 (default: 1.0)
-s          insert ratio for mixed workload: 0.0~1.0 (default: 0.0)
-d          delete ratio for mixed workload: 0.0~1.0 (default: 0.0)
-e          whether to register epoch in application level: 0/1 (default: 0)
-k          the type of stored keys: fixed/variable (default: "fixed")
-vl         the length of the variable length key (default: 16)

Check out also the run.sh script for example benchmarks and easy testing of the hash tables.

Example program

To know how to integrate the Dash into your application, check out example.cpp under src. The executable is example under your build directory. Also check CMakeLists.txt to know how to link with dependencies (customized PMDK and epoch manager) for correct build.

Miscellaneous

We noticed a possible mmap bug on our testing environment: MAP_SHARED_VALIDATE is incompatible with MAP_FIXED_NOREPLACE (since Linux 4.17). To ensure safe memory mapping, we modified the original PMDK to use MAP_SHARED rather than MAP_SHARED_VALIDATE, which has the same functionality as the former one except for extra flag validation. For a more detailed explanation and minimal reproducible code, please check out our blog post about this issue.

Contact

For any questions, please contact us at btlu@cse.cuhk.edu.hk and tzwang@sfu.ca.