/MadFS

Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”

Primary LanguageC++

MadFS

workflow workflow workflow

Source code for FAST '23 paper: MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems by Shawn Zhong*, Chenhao Ye*, Guanzhou Hu, Suyan Qu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Michael Swift. (*equal contribution.) FAST '23. Paper. Video. Slides. Code.

Abstract

Persistent memory (PM) can be accessed directly from userspace without kernel involvement, but most PM filesystems still perform metadata operations in the kernel for security and rely on the kernel for cross-process synchronization.

We present per-file virtualization, where a virtualization layer implements a complete set of file functionalities, including metadata management, crash consistency, and concurrency control, in userspace. We observe that not all file metadata need to be maintained by the kernel and propose embedding insensitive metadata into the file for userspace management. For crash consistency, copy-on-write (CoW) benefits from the embedding of the block mapping since the mapping can be efficiently updated without kernel involvement. For cross-process synchronization, we introduce lock-free optimistic concurrency control (OCC) at user level, which tolerates process crashes and provides better scalability.

Based on per-file virtualization, we implement MadFS, a library PM filesystem that maintains the embedded metadata as a compact log. Experimental results show that on concurrent workloads, MadFS achieves up to 3.6x the throughput of ext4-DAX. For real-world applications, MadFS provides up to 48% speedup for YCSB on LevelDB and 85% for TPC-C on SQLite compared to NOVA.

BibTex
@inproceedings {285756,
author = {Shawn Zhong and Chenhao Ye and Guanzhou Hu and Suyan Qu and Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau and Michael Swift},
title = {{MadFS}: {Per-File} Virtualization for Userspace Persistent Memory Filesystems},
booktitle = {21st USENIX Conference on File and Storage Technologies (FAST 23)},
year = {2023},
isbn = {978-1-939133-32-8},
address = {Santa Clara, CA},
pages = {265--280},
url = {https://www.usenix.org/conference/fast23/presentation/zhong},
publisher = {USENIX Association},
month = feb,
}

Prerequisites

  • MadFS is developed on Ubuntu 20.04.3 LTS and Ubuntu 22.04.1 LTS. It should work on other Linux distributions as well.

  • MadFS requires a C++ compiler with C++ 20 support. The compilers known to work includes GCC 11.3.0, GCC 10.3.0, Clang 14.0.0, and Clang 10.0.0.

  • Install dependencies and configure the system
    • Install build dependencies

      sudo apt update
      sudo apt install -y cmake build-essential gcc-10 g++-10
    • Install development dependencies (optional)

      # to run sanitizers and formatter
      sudo apt install -y clang-10 libstdc++-10-dev clang-format-10
      # for perf
      sudo apt install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`
      # for managing persistent memory and NUMA
      sudo apt install -y ndctl numactl
      # for benchmarking
      sudo apt install -y sqlite3
    • Configure the system

      ./scripts/init.py
  • Configure persistent memory
    • To emulate a persistent memory device using DRAM, please follow the guide here.

    • Initialize namespaces (optional)

      # remove existing namespaces on region0
      sudo ndctl destroy-namespace all --region=region0 --force 
      # create new namespace `/dev/pmem0` on region0
      sudo ndctl create-namespace --region=region0 --size=20G
      # create new namespace `/dev/pmem0.1` on region0 for NOVA (optional)
      sudo ndctl create-namespace --region=region0 --size=20G
      # list all namespaces
      ndctl list --region=0 --namespaces --human --idle
    • Use /dev/pmem0 to mount ext4-DAX at /mnt/pmem0-ext4-dax

      # create filesystem
      sudo mkfs.ext4 /dev/pmem0
      # create mount point
      sudo mkdir -p /mnt/pmem0-ext4-dax
      # mount filesystem
      sudo mount -o dax /dev/pmem0 /mnt/pmem0-ext4-dax
      # make the mount point writable
      sudo chmod a+w /mnt/pmem0-ext4-dax
      # check mount status
      mount -v | grep /mnt/pmem0-ext4-dax
    • Use /dev/pmem0.1 to mount NOVA at /mnt/pmem0-nova (optional)

      # load NOVA module
      sudo modprobe nova
      # create mount point
      sudo mkdir -p /mnt/pmem0-nova
      # mount filesystem
      sudo mount -t NOVA -o init -o data_cow  /dev/pmem0.1 /mnt/pmem0-nova
      # make the mount point writable
      sudo chmod a+w /mnt/pmem0-nova           
      # check mount status
      mount -v | grep /mnt/pmem0-nova          
    • To unmount the filesystems, run

      sudo umount /mnt/pmem0-ext4-dax
      sudo umount /mnt/pmem0-nova

Build and Run

  • Build the MadFS shared library

    # Usage: make [release|debug|relwithdebinfo|profile|pmemcheck|asan|ubsan|msan|tsan]
    #             [CMAKE_ARGS="-DKEY1=VAL1 -DKEY2=VAL2 ..."] 
    make BUILD_TARGETS="madfs"
  • Run your program with MadFS

    LD_PRELOAD=./build-release/libmadfs.so ./your_program
    Sample output
    BuildOptions: 
        build type:
            name: release
            debug: 0
            use_pmemcheck: 0
        hardware support:
            clwb: 1
            clflushopt: 1
            avx512f: 1
        features: 
            map_sync: 1
            map_populate: 1
            tx_flush_only_fsync: 1
            enable_timer: 0
        concurrency control:
            cc_occ: 1
            cc_mutex: 0
            cc_spinlock: 0
            cc_rwlock: 0
    
    RuntimeOptions:
        show_config: 1
        strict_offset_serial: 0
        log_file: None
        log_level: 1
    
    # Your program output here
    
    MadFS unloaded
  • Run tests

    ./scripts/run.py [test_basic|test_rc|test_sync|test_gc]
    # See `./scripts/run.py --help` for more options
    
  • Run and plot single-threaded benchmarks
    ./scripts/bench_st.py --filter="seq_pread"
    ./scripts/bench_st.py --filter="rnd_pread"
    ./scripts/bench_st.py --filter="seq_pwrite"
    ./scripts/bench_st.py --filter="rnd_pwrite"
    ./scripts/bench_st.py --filter="cow"
    ./scripts/bench_st.py --filter="append_pwrite"
    
    # Limit to set of file systems
    ./scripts/bench_st.py -f MadFS SplitFS
    
    # Profile a data point
    ./scripts/bench_st.py --filter="seq_pread/512" -f MadFS -b profile
    
    # See `./scripts/bench_st.py` --help for more options
  • Run and plot multi-threaded benchmarks
    ./scripts/bench_mt.py --filter="unif_0R"
    ./scripts/bench_mt.py --filter="unif_50R"
    ./scripts/bench_mt.py --filter="unif_95R"
    ./scripts/bench_mt.py --filter="unif_100R"
    ./scripts/bench_mt.py --filter="zipf_2k"
    ./scripts/bench_mt.py --filter="zipf_4k"
  • Run and plot metadata benchmarks
    ./scripts/bench_open.py
    ./scripts/bench_gc.py
  • Run and plot macrobenchmarks (SQLite and LevelDB)
    ./scripts/bench_tpcc.py
    ./scripts/bench_ycsb.py

Directory Structure

  • src/: Source code for the MadFS shared library

  • scripts/: Scripts for building, running, and plotting benchmarks

  • bench/: Source code for benchmarks

  • test/: Source code for tests

  • tools/: Source code for tools (e.g., gc, conversion, info)

  • cmake/: CMake modules

  • data/: Data files for benchmarks

Contact

If you have any questions, feel free to open an issue or contact Shawn Zhong (shawn.zhong@wisc.edu) and Chenhao Ye (chenhaoy@cs.wisc.edu). We are also happy to accept pull requests.