/storage_paper_list

Some paper lists related to storage systems

Paper Reading List of Storage Systems

A reading list related to storage systems, including data deduplication, erasure coding, general storage and other related topics (i.e., Security...), updating from time to time~

[TOC]

Data Deduplication

Summary

  1. Understanding Data Deduplication Ratios----SNIA'08 (link)
  2. A Survey of Classification of Storage Deduplication Systems----ACM Computing Surveys'14 (link)
  3. A Comprehensive Study of the Past, Present, and Future on Data Deduplication----Proceedings of the IEEE'16 (link)
  4. 99 Deduplication Problems----HotStorage'16 (link) (summary)
  5. A Survey of Secure Data Deduplication Schemes for Cloud Storage Systems----ACM Computing Surveys'17 (link)
  6. Backup to the Future: How Workload and Hardware Changes Continually Redefine Data Domain File Systems----IEEE Computer'17 (link)

Workload Analysis

  1. Characterizing Datasets for Data Deduplication in Backup Applications----IISWC'10 (link)
  2. A Study of Practical Deduplication----FAST'11 (link) summary
  3. Capacity Forecasting in a Backup Storage Environment----LISA'11 (link) summary
  4. Characteristics of Backup Workloads in Production Systems----FAST'12 (link) summary
  5. A Study on Data Deduplication in HPC Storage Systems----SC'12 (link)
  6. Inside Dropbox: Understanding Personal Cloud Storage Services----IMC'12 (link)
  7. Insights for Data Reduction in Primary Storage: a Practical Analysis----SYSTOR'12 (link)
  8. Modeling the Dropbox Client Behavior----ICC'14 (link)
  9. Identifying Trends in Enterprise Data Protection Systems----USENIX ATC'15 (link)
  10. A Long-Term User-Centric Analysis of Deduplication Patterns----MSST'16 (link)
  11. Getting back up: Understanding how enterprise data backups fail----USENIX ATC'16 (link)
  12. A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication----TC'18 (link) summary
  13. Deduplication Analyses of Multimedia System Images----HotStorage'18 (link)
  14. Improving Docker Registry Design based on Production Workload Analysis----FAST'18 (link)

Deduplicated System Design

  1. Venti: A New Approach to Archival Storage----FAST'02 (link)
  2. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System----FAST'08 (link) summary
  3. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality----FAST'09 (link) summary
  4. Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup----MASCOTS'09 (link) summary
  5. I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance----FAST'10 (link)
  6. dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)----MSST'10 (link) summary
  7. ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory----USENIX ATC'10 (link)
  8. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput----USENIX ATC'11 (link)
  9. Building a High-performance Deduplication System----USENIX ATC'11 (link) summary
  10. Primary Data Deduplication - Large Scale Study and System Design----USENIX ATC'12 (link)
  11. iDedup: Latency-aware, Inline Data Deduplication for Primary Storage----FAST'12 (link) summary
  12. Deduplication in SSDs: Model and quantitative analysis----MSST'12 (link)
  13. Efficiently Storing Virtual Machine Backups----HotStorage'13 (link)
  14. Storage Efficiency Opportunities and Analysis for Video Repositories----HotStorage'15 (link)
  15. Deriving and Comparing Deduplication Techniques Using a Model-Based Classification----EuroSys'15 (link)
  16. Design Tradeoffs for Data Deduplication Performance in Backup Workloads----FAST'15 (link) summary
  17. Sorted Deduplication: How to Process Thousands of Backup Streams----MSST'16 (link)
  18. Backup to the future: How workload and hardware changes continually redefine data domain file systems----TC'17 (link)
  19. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 (link) summary
  20. SmartDedup: Optimizing Deduplication for Resource-constrained Devices----USENIX ATC'19 (link)
  21. DupHunter: Flexible High-Performance Deduplication for Docker Registries----USENIX ATC'20 (link)
  22. The Dilemma between Deduplication and Locality: Can Both be Achieved?---FAST'21 (link) summary
  23. SLIMSTORE: A Cloud-based Deduplication System for Multi-version Backups----ICDE'21 (link)
  24. Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling----ACM TOS'21 (link)
  25. BURST: A Chunk-Based Data Deduplication System with Burst-Encoded Fingerprint Matching----MSST'24 (link)

Restore Performances

  1. RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups----APSys'13 (link) summary
  2. ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look-Ahead Window Assisted Chunk Caching----FAST'18 (link) summary
  3. Reducing Impact of Data Fragmentation Caused by In-line Deduplication----SYSTOR'12 (link)
  4. Reducing Fragmentation Impact with Forward Knowledge in Backup Systems with Deduplication----SYSTOR'15 (link)
  5. Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets----MASCOTS'12 (link)
  6. Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance----FAST'19 (link) summary
  7. Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication---FAST'13 (link) summary
  8. Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage----HPCC'11
  9. Improving the Restore Performance via Physical Locality Middleware for Backup Systems----Middleware'20 (link) summary
  10. Efficient Hybrid Inline and Out-of-Line Deduplication for Backup Storage----ACM TOS'14 (link)

Secure Deduplication

  1. Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds----HotStorage'14 (link) summary
  2. CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal----USENIX ATC'15 (link) summary
  3. Information Leakage in Encrypted Deduplication via Frequency Analysis----DSN'17 (link)
  4. DupLESS: Server-Aided Encryption for Deduplicated Storage----USENIX Security'13 (link) summary
  5. Side Channels in Cloud Services, the Case of Deduplication in Cloud Storage----S&P'10 (link) summary
  6. Side Channels in Deduplication: Trade-offs between Leakage and Efficiency----AsiaCCS'17 (link) summary
  7. On Information Leakage in Deduplication Storage Systems----CCS Workshop'16 summary
  8. SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management----MSST'15 (link)
  9. Message-Locked Encryption and Secure Deduplication----EuroCrypt'13 summary
  10. Proofs of Ownership in Remote Storage System----CCS'11 (link)
  11. Tapping the Potential: Secure Chunk-based Deduplication of Encrypted Data for Cloud Backup----CNS'18 summary
  12. A Bandwidth-Efficient Middleware for Encrypted Deduplication----DSC'18 summary
  13. Bloom Filter Based Privacy Preserving Deduplication System----Springer International Conference on Security & Privacy'19 (link) summary
  14. Enhanced Secure Thresholded Data Deduplication Scheme for Cloud Storage----TDSC'16 (link) summary
  15. Transparent Data Deduplication in the Cloud----CCS'15 (link) summary
  16. Secure Deduplication of Encrypted Data without Additional Independent Servers----CCS'15 (link) summary
  17. Fast and Secure Laptop Backups with Encrypted Deduplication----LISA'10 (link)
  18. Weak Leakage-Resilient Client-side Deduplication of Encrypted Data in Cloud Storage----ASIA CCS'13 (link)
  19. Lamassu: Storage-Efficient Host-Side Encryption----USENIX ATC'15 (link)
  20. Mitigating Traffic-based Side Channel Attacks in Bandwidth-efficient Cloud Storage----IPDPS'18 (link) summary
  21. RARE: Defeating Side Channels based on Data-Deduplication in Cloud Storage----INFOCOM'18 (link) summary
  22. PerfectDedup: Secure Data Deduplication----Data Privacy Management, and Security Assurance'15 (link)
  23. Privacy Aware Data Deduplication for Side Channel in Cloud Storage----ToCC'18 (link)
  24. PraDa: Privacy-preserving Data Deduplication as a Service----CIKM'14 (link)
  25. Privacy-Preserving Data Deduplication on Trusted Processors----CLOUD'17 (link) summary
  26. Distributed Key Generation for Encrypted Deduplication: Achieving the Strongest Privacy----CCSW'14 (link) summary
  27. Proofs of Ownership on Encrypted Cloud Data via Intel SGX----ACNS'20 (link) summary
  28. Accelerating Encrypted Deduplication via SGX----USENIX ATC'21(link)
  29. S2Dedup: SGX-enabled Secure Deduplication----SYSTOR'21 (link) summary
  30. Secure Deduplication of General Computations----USENIX ATC'15 (link)
  31. When Delta Sync Meets Message-Locked Encryption: a Feature-based Delta Sync Scheme for Encrypted Cloud Storage----ICDCS'21 (link) summary
  32. DUPEFS: Leaking Data Over the Network With Filesystem Deduplication Side Channels----FAST'22 (link) summary

Metadata Management

  1. Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection----MSST'19 (link)
  2. Rekeying for Encrypted Deduplication Storage----DSN'16 (link) summary
  3. File Recipe Compression in Data Deduplication Systems----FAST'13 (link) summary
  4. Metadata Considered Harmful ... to Deduplication----HotStorage'15 (link) summary

Indexing & Caching

  1. LIPA: A Learning-based Indexing and Prefetching Approach for Data Deduplication----MSST'19 (link) summary
  2. Lazy Exact Deduplication----MSST'16 (link)
  3. MAD2: A Scalable High-throughput Exact Deduplication Approach for Network Backup Services----MSST'10 (link)
  4. Block Locality Caching for Data Deduplication----SYSTOR'13 (link)
  5. HANDS: A Heuristically Arranged Non-Backup In-line Deduplication System----ICDE'13 (link)

Deduplication Estimation

  1. Estimating Unseen Deduplication - from Theory to Practice----FAST'16 (link) summary
  2. Estimation of Deduplication Ratios in Large Data Sets----MSST'12 (link) summary
  3. Sketching Volume Capacities in Deduplicated Storage----FAST'19 (link) summary
  4. Estimating Duplication by Content-based Sampling----USENIX ATC'13 summary
  5. Content-aware Load Balancing for Distributed Backup----LISA'11 (link)
  6. Rangoli: Space Management in Deduplication Environments----SYSTOR'13 (link) summary

Tiering Deduplication

  1. Data Domain Cloud Tier: Backup here, Backup there, Deduplicated Everywhere!----USENIX ATC'19 (link) summary
  2. InftyDedup: Scalable and Cost-Effective Cloud Tiering with Deduplication----FAST'23 (link) summary

Post-Deduplication: Data Compression, Delta Compression, and Application

  1. Redundancy Elimination Within Large Collections of Files----USENIX ATC'04 (link)
  2. The Design of a Similarity Based Deduplication System----SYSTOR'09 (link)
  3. Delta Compressed and Deduplicated Storage Using Stream-Informed Locality----HotStorage'12 (link) summary
  4. WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression----FAST'12 (link) summary
  5. To Zip or not to Zip: Effective Resource Usage for Real-Time Compression----FAST'13 (link) summary
  6. Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets----DCC'14 (link) summary
  7. Ddelta: A Deduplication-inspired Fast Delta Compression Approach----Performance'14 (link)
  8. Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility----FAST'14 (link) summary
  9. Odess: Speeding up Resemblance Detection for Redundancy Elimination by Fast Content-Defined Sampling----ICDE'14 (link)
  10. Reducing Replication Bandwidth for Distributed Document Databases----SoCC'15 (link)
  11. Edelta: A Word-Enlarging Based Fast Delta Compression Approach----HotStorage'15 (link)
  12. Online Deduplication for Database----SIGMOD'17 (link)
  13. Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression----FAST'19 (link) summary
  14. Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression----TPDS'20 (link)
  15. Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio----CLUSTER'20 (link) summary
  16. Adaptively Compressing IoT Data on the Resource-constrained Edge----HotEdge'20 (link)
  17. Length Preserving Compression – Marrying Encryption with Compression----SYSTOR'21 (link) summary
  18. DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression----FAST'22 (link) summary
  19. Building a High Performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio----USENIX ATC'22 (link) summary
  20. Donag: Generating Eficient Patches and Difs for Compressed Archives----ACM TOS'22 (link)
  21. LoopDelta: Embedding Locality-aware Opportunistic Delta Compression in Inline Deduplication for Highly Efficient Data Reduction----USENIX ATC'23 (link)
  22. Palantir: Hierarchical Similarity Detection for Post-Deduplication Delta Compression----ASPLOS'24 (link)
  23. DedupSearch: Two-Phase Deduplication Aware Keyword Search----FAST'22 (link) summary
  24. Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index----FAST'24 (link) summary
  25. Is Low Similarity Threshold A Bad Idea in Delta Compression?----HotStorage'24 (link)

Memory && Block-Layer Deduplication

  1. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives----FAST'11 (link) summary
  2. XLM: More Effective Memory Deduplication Scanners through Cross-Layer Hints----USENIX ATC'13 (link)
  3. Dmdedup: Device Mapper Target for Data Deduplication-----OLS'14 (link)
  4. Using Hints to Improve Inline Block-Layer Deduplication----FAST'16 (link) summary
  5. OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash----FAST'16 (link)
  6. UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling----FAST'18 (link) summary
  7. Remap-SSD: Safely and Efficiently Exploiting SSD Address Remapping to Eliminate Duplicate Writes----FAST'21 (link)
  8. Memory Deduplication for Serverless Computing with Medes----EuroSys'22 (link)
  9. On the Effectiveness of Same-Domain Memory Deduplication----EuroSec'22 (link)
  10. Dedup-for-Speed: Storing Duplications in Fast Programming Mode for Enhanced Read Performance----SYSTOR'22 (link)

Data Chunking

  1. A Framework for Analyzing the Improving Content-Based Chunking Algorithms----HP Technique Report'05 (link)
  2. Multi-Level Comparison of Data Deduplication in a Backup Scenario----SYSTOR'09 (link)
  3. Frequency Based Chunking for Data De-Duplication----MASCOTS'10 (link) summary
  4. Bimodal Content Defined Chunking for Backup Streams----FAST'10 (link)
  5. MUCH: Multi-threaded Content-Based File Chunking----TC'15 (link)
  6. FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication----USENIX ATC'16 (link) summary
  7. SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage----SYSTOR'19 (link) summary
  8. RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems----SoCC'19 (link) summary

Cache Deduplication

  1. PLC-cache: Endurable SSD cache for deduplication-based primary storage----MSST'14 (link)
  2. Nitro: A Capacity-Optimized SSD Cache for Primary Storage----USENIX ATC'14 (link)
  3. CDAC: Content-Driven Deduplication-Aware Storage Cache----MSST'19 (link)
  4. Austere Flash Caching with Deduplication and Compression----USENIX ATC'20 (link)

Garbage Collection

  1. Memory Efficient Sanitization of a Deduplicated Storage System----FAST'13 (link) summary
  2. Concurrent Deletion in a Distributed Content-addressable Storage System with Global Deduplication----FAST'13 (link)
  3. Accelerating Restore and Garbage Collection in Deduplication-based Backup System via Exploiting Historical Information----USENIX ATC'14 (link) summary
  4. The Logic of Physical Garbage Collection in Deduplicating Storage----FAST'17 (link)

Network Deduplication

  1. EF-Dedup: Enabling Collaborative Data Deduplication at the Network Edge----ICDCS'19 (link)

Distributed Deduplication

  1. Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems--IWQoS'15 (link) summary
  2. Probabilistic Deduplication for Cluster-Based Storage Systems----SoCC'12 (link) summary
  3. A Scalable Inline Cluster Deduplication Framework for Big Data Protection----Middleware'12 (link)
  4. Tradeoffs in Scalable Data Routing for Deduplication Clusters----FAST'11 (link) summary
  5. Cluster and Single-Node Analysis of Long-Term Deduplication Patterns----ACM TOS'18 (link) summary
  6. Decentralized Deduplication in SAN Cluster File Systems----USENIX ATC'09 (link)
  7. HYDRAstore: A Scalable Secondary Storage----FAST'09 (link)
  8. GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage----FAST'20 (link)
  9. The what, The from, and The to: The Migration Games in Deduplicated Systems----FAST'22 (link) summary

Deduplication in NVM

  1. Nv-dedup: High performance Inline Deduplication for Non-volatile Memory----TC'17 (link)
  2. Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes----MICRO'18 (link)
  3. DeNOVA: Deduplication Extended NOVA File System----IPDPS'22 (link)
  4. Light-Dedup: A Light-weight Inline Deduplication Framework for Non-Volatile Memory File Systems----USENIX ATC'23 (link) summary

Erasure Coding && RAID

Erasure Coding Basics

  1. Network Coding for Distributed Storage System----TIT'09
  2. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage----FAST'09
  3. Erasure Coding for Cloud Storage Systems: A Survey----By Jun Li in 2013

Improve Data Recovery

  1. CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems----MSST'13
  2. Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters----DSN'14
  3. Repair Pipelining for Erasure-Coded Storage----USENIX ATC'17
  4. A Tale of Two Erasure Codes in HDFS----FAST'15
  5. On the Speedup of Single-Disk Failure Recovery in XOR-Coded Storage Systems: Theory and Practice----MSST'12
  6. Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads----FAST'12
  7. Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage----SYSTOR'14
  8. Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File System----DSN'15
  9. Reconsidering Single Failure Recovery in Clustered File Systems----DSN'16 summary
  10. RAFI: Risk-Aware Failure Identification to Improve the RAS in Erasure-coded Data Center----USENIX ATC'18
  11. Partial-Parallel-Repair (PPR): A Distributed Technique for Repairing Erasure Coded Storage----EuroSys'16

EC Update Issue

  1. Cross-Rack-Aware Updates in Erasure-Coded Data Centers----ICPP'18
  2. PARIX: Speculative Partial Writes in Erasure-Coded Systems----USENIX ATC'17

EC Framework

  1. OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems----FAST'19
  2. ParaRC: Embracing Sub-Packetization for Repair Parallelization in MSR-Coded Storage----FAST'23 (link)

New EC code

  1. CodePlugin: Plugging Deduplication into Erasure Coding for Cloud Storage----HotCloud'15
  2. Double Regenerating Codes for Hierarchical Data Centers----ISIT'16
  3. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems----ACM TOS'13
  4. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage and Network-bandwidth----FAST'15
  5. Opening the Chrysalis: On the Real Repair Performance of MSR Codes----FAST'16
  6. NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds----FAST'12
  7. Erasure Coding in Windows Azure Storage----USENIX ATC'12
  8. XORing Elephants: Novel Erasure Codes for Big Data----VLDB'13
  9. Clay Codes: Moulding MDS Codes to Yield an MSR Code----FAST'18
  10. Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments----DSN'18 summary
  11. On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes----USENIX ATC'18
  12. Parallelism-Aware Locally Repairable Code for Distributed Storage Systems----ICDCS'18
  13. Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems----HotStorage'15
  14. Pipelined regeneration with Regenerating Codes for Distributed Storage Systems----NetCod'11
  15. Cooperative Pipelined Regeneration in Distribution Storage Systems----INFOCOM'14
  16. Zebra: Demand-aware Erasure Coding for Distributed Storage Systems----IWQoS'16
  17. On Data Parallelism of Erasure Coding in Distributed Storage Systems----ICDCS'17

EC System

  1. Giza: Erasure Coding Objects across Global Data Centers----USENIX ATC'17
  2. EC-Store: Bridging the Gap Between Storage and Latency in Distributed Erasure Coded Systems----ICDCS'18
  3. Latency Reduction and Load Balancing in Coded Storage Systems----SoCC'17

RAID

  1. RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures----FAST'21 (link)
  2. FusionRAID: Achieving Consistent Low Latency for Commodity SSD Arrays----FAST'22 (link)

Security

Survey

  1. A Survey on Systems Security Metrics----ACM Computing Surveys'16

Secret Sharing

  1. How to Best Share a Big Secret----SYSTOR'18 (link) summary
  2. AONT-RS: Blending Security and Performance in Dispersed Storage Systems----FAST'11
  3. Splinter: Practical Private Queries on Public Data----NSDI'17

Data Encryption

  1. Efficient Homophonic Coding----TIT'99 (link)
  2. How Far Can we Go Beyond Linear Cryptanalysis?----AsiaCRYPTO'04 (link)
  3. CryptDB: Protecting Confidentiality with Encrypted Query Processing----SOSP'11 (link)
  4. Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space----USENIX Security'11 (link)
  5. RAPPOR: Randomized Aggregable Privacy-Preserving Ordinal Response----CCS'14 (link)
  6. Frequency-Hiding Order-Preserving Encryption----CCS'15 (link)
  7. Inference Attacks on Property-Preserving Encrypted Databases----CCS'15 (link)
  8. A Note on the Optimality of Frequency Analysis vs. lp-Optimization----IACR'15 (link)
  9. Oblivious RAM as a Substrate for Cloud Storage - The Leakage Challenge Ahead----CCSW'16 (link) summary
  10. Oblivious RAM: A Dissection and Experimental Evaluation---VLDB'16 (link)
  11. MiniCrypt: Reconciling Encryption and Compression for Big Data Stores----EuroSys'17 (link)
  12. Splinter: Practical Private Queries on Public Data----NSDI'17 (link)
  13. Frequency-smoothing Encryption: Preventing Snapshot Attacks on Deterministically Encrypted Data----IACR'17 (link) summary
  14. The Overhead of Confidentiality and Client-side Encryption in Cloud Storage Systems----UCC'19 (link) summary
  15. PRO-ORAM: Practical Read-Only Oblivious RAM----RAID'19 (link)
  16. Quantifying Information Leakage of Deterministic Encryption----CCSW'19 (link) summary
  17. Pancake: Frequency Smoothing for Encrypted Data Stores----USENIX Security'20 (link)
  18. Hiding the Lengths of Encrypted Message via Gaussian Padding----CCS'21 (link)
  19. On Fingerprinting Attacks and Length-Hiding Encryption----CT-RSA'22 (link)
  20. Rethinking Block Storage Encryption with Virtual Disks----HotStorage'22 (link) summary

Secure Deletion

  1. Secure Overlay Cloud Storage with Access Control and Assured Deletion----TDSC'12 (link) summary

Differential Privacy

  1. Differential Privacy----ICALP'06 (link)
  2. Calibrating Noise to Sensitivity in Private Data Analysis----TCC'06 (link)
  3. Differentially Private Access Patterns for Searchable Symmetric Encryption----INFOCOM'18 (link) summary
  4. Privacy at Scale: Local Differential Privacy in Practice----SIGMOD'18 (link)

SGX Technique

  1. Graphene-SGX: A Practical Library OS for Unmodified Applications on SGX----USENIX ATC'17 (link)
  2. Intel SGX Explained----IACR'16 (link)
  3. OpenSGX: An Open Platform for SGX Research----NDSS'16 (link)
  4. SCONE: Secure Linux Containers with Intel SGX----OSDI'16 (link)
  5. Varys: Protecting SGX Enclaves From Practical Side-Channel Attacks---USENIX ATC'18 (link)
  6. sgx-perf: A Performance Analysis Tool for Intel SGX Enclaves----Middleware'18 (link) summary
  7. TaLoS: Secure and Transparent TLS Termination inside SGX Enclaves----arxiv'17 (link) summary
  8. Switchless Calls Made Practical in Intel SGX----SysTex'18 (link) summary
  9. Regaining Lost Seconds: Efficient Page Preloading for SGX Enclaves----Middleware'20 (link)
  10. Everything You Should Know About Intel SGX Performance on Virtualized Systems----Sigmeterics'19 (link) summary
  11. A Comparison Study of Intel SGX and AMD Memory Encryption Technology---HASP'18 (link)
  12. SGXoMeter: Open and Modular Benchmarking for Intel SGX----EuroSec'21 (link)
  13. Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution----USENIX Security'18 (link)

SGX Storage

  1. NEXUS: Practical and Secure Access Control on Untrusted Storage Platforms using Client-side SGX----DSN'19 (link)
  2. Securing the Storage Data Path with SGX Enclaves----arxiv'18 (link) summary
  3. EnclaveDB: A Secure Database using SGX----S&P'18 (link)
  4. Isolating Operating System Components with Intel SGX----SysTEX'16 (link)
  5. SPEICHER: Securing LSM-based Key-Value Stores using Shielded Execution----FAST'19 (link) summary
  6. ShieldStore: Shielded In-memory Key-Value Storage with SGX----EuroSys'19 (link) summary
  7. SeGShare: Secure Group File Sharing in the Cloud using Enclaves----DSN'20 (link) summary
  8. DISKSHIELD: A Data Tamper-Resistant Storage for Intel SGX----AsiaCCS'20 (link)
  9. SPEED: Accelerating Enclave Applications via Secure Deduplication----ICDCS'19 (link) summary
  10. Secure In-memory Key-Value Storage with SGX----SoCC'18
  11. EnclaveCache: A Secure and Scalable Key-value Cache in Multi-tenant Clouds using Intel SGX----Middleware'19 (link) summary
  12. Building enclave-native storage engines for practical encrypted databases----VLDB'21 (link)
  13. Aria: Tolerating Skewed Workloads in Secure In-memory Key-value Stores----ICDE'21 (link)

Network Security

  1. A Privacy-Preserving Defense Mechanism Against Request Forgery Attacks----TrustCom'11 (link) summary
  2. Internet Censorship in Thailand: User Practices and Potential Threats----EuroS&P'17 (link)
  3. Accessing Google Scholar under Extreme Internet Censorship: A Legal Avenue----Middleware'17 (link)
  4. How China Detects and Blocks Shadowsocks----IMC'20 (link)
  5. How the Great Firewall of China Detects and Blocks Fully Encrypted Traffic----USENIX Security'23 (link)

General Storage

HDD, SMR

  1. Revisiting HDD Rules of Thumb: 1/3 Is Not (Quite) the Average Seek Distance----MSST'24 (link)

Distributed Storage System

  1. MapReduce: Simplified Data Processing on Large Clusters----OSDI'04 (link)
  2. Cumulus: Filesystem Backup to the Cloud----FAST'09 (link) summary
  3. RACS: A Case for Cloud Storage Diversity----SoCC'10 (link)
  4. The Hadoop Distributed File System----MSST'10 (link) summary
  5. SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services----SOSP'13 (link) summary
  6. A Day Late and a Dollar Short: The Case for Research on Cloud Billing Systems----HotCloud'14 (link)
  7. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Service----NSDI'15 (link)
  8. Kurma: Secure Geo-Distributed Multi-Cloud Storage Gateways----SYSTOR'19 (link) summary
  9. Ursa: Hybrid Block Storage for Cloud-Scale Virtual Disks----EuroSys'19 (link)
  10. Duplicacy: A New Generation of Cloud Backup Tool Based on Lock-Free Deduplication----TCC'20 (link) summary

Consensus

  1. In Search of an Understandable Consensus Algorithm----USENIX ATC'14 (link)

Cache

  1. TinyLFU: A Highly Efficient Cache Admission Policy----ACM TOS'17 (link)
  2. Hyperbolic Caching: Flexible Caching for Web Applications----USENIX ATC'17 (link)
  3. Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification----USENIX NSDI'19 (link)
  4. It’s Time to Revisit LRU vs. FIFO----HotStorage'20 (link) summary trace
  5. The CacheLib Caching Engine: Design and Experiences at Scale----OSDI'20 (link)
  6. Unifying the Data Center Caching Layer — Feasible? Profitable?----HotStorage'21 (link)
  7. Learning Cache Replacement with Cacheus----FAST'21 (link)
  8. Kangaroo: Caching Billions of Tiny Objects on Flash----SOSP'21 (link)
  9. Segcache: a Memory-efficient and Scalable In-memory Key-value Cache for Small Objects----NSDI'21 (link)
  10. FarReach: Write-back Caching in Programmable Switches----USENIX ATC'23 (link)
  11. FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion----HotOS'23 (link)

Hash

  1. An Analysis of Compare-by-Hash----HotOS'03 (link)
  2. On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution----S&P'04 (link)
  3. Compare-by-Hash: A Reasoned Analysis----USENIX ATC'06 (link) summary
  4. Don’t Thrash: How to Cache your Hash on Flash----HotStorage'11 (link)
  5. Algorithmic Improvements for Fast Concurrent Cuckoo Hashing----EuroSys'14 (link)

Lock-free storage

  1. A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring----IPDPS'10 (link)
  2. Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation----FAST'20 (link)

SSD, Flash

  1. Design Tradeoffs for SSD Performance----USENIX ATC'08 (link)
  2. Design Tradeoffs for SSD Reliability----USENIX ATC'19 (link)
  3. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments----FAST'16 (link)
  4. The Unwritten Contract of Solid State Drives----EuroSys'17 (link)
  5. The CASE of FEMU: Cheap, Accurate, Scalable and Extensible Flash Emulator----FAST'18 (link) summary
  6. From blocks to rocks: a natural extension of zoned namespaces----HotStorage'21 (link)
  7. Don’t Be a Blockhead: Zoned Namespaces Make Work on Conventional SSDs Obsolete----HotOS'21 (link) summary
  8. What Systems Researchers Need to Know about NAND Flash----HotStorage'13 (link)
  9. Caveat-Scriptor: Write Anywhere Shingled Disks----HotStorage'15 (link)
  10. Towards an Unwritten Contract of Intel Optane SSD----HotStorage'19 (link)
  11. Improving the Reliability of Next Generation SSDs using WOM-v Codes----FAST'22 (link)
  12. Fantastic SSD internals and how to learn and use them----SYSTOR'22 (link)
  13. Understanding NVMe Zoned Namespace (ZNS) Flash SSD Storage Devices----arxiv'22 (link)
  14. Compaction-Aware Zone Allocation for LSM based Key-Value Store on ZNS SSDs----HotStorage'22 (link)
  15. Lifetime-leveling LSM-tree Compaction for ZNS SSD----HotStorage'22 (link)
  16. What You Can't Forget: Exploiting Parallelism for Zoned Namespaces----HotStorage'22 (link)
  17. NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow----USENIX ATC'22 (link)
  18. Offline and Online Algorithms for SSD Management----ACM TOS'22 (link)
  19. NVMeVirt: A Versatile Software-defined Virtual NVMe Device----FAST'23 (link)
  20. Excessive SSD-Internal Parallelism Considered Harmful----HotStorage'23 (link)
  21. Is Garbage Collection Overhead Gone? Case study of F2FS on ZNS SSDs----HotStorage'23 (link)
  22. ZapRAID: Toward High-Performance RAID for ZNS SSDs via Zone Append----ApSys'23 (link)
  23. BypassD: Enabling fast userspace access to shared SSDs----ASPLOS'24 (link)

Open-Channel SSD, ZNS, SMR

  1. LightNVM: The Linux Open-Channel SSD Subsystem----USENIX FAST'17 (link)
  2. ZoneAlloy: Elastic Data and Space Management for Hybrid SMR Drives----HotStorage'19 (link)
  3. Zone Append: A New Way of Writing to Zoned Storage----Vault'20 (link)
  4. ZNS: Avoiding the Block Interface Tax for Flash-based SSDs----USENIX ATC'21 (link) code
  5. ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction----OSDI'21 (link)
  6. RAIZN: Redundant Array of Independent Zoned Namespaces----ASPLOS'23 (link)
  7. An Efficient Order-Preserving Recovery for F2FS with ZNS SSD----HotStorage'23 (link)
  8. Is Garbage Collection Overhead Gone? Case study of F2FS on ZNS SSDs----HotStorage'23 (link)
  9. A Free-Space Adaptive Runtime Zone-Reset Algorithm for Enhanced ZNS Efficiency----HotStorage'23 (link)
  10. Can ZNS SSDs be Better Storage Devices for Persistent Cache?----HotStorage'24 (link) summary

Non-volatile Memory

  1. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories----FAST'16 (link)
  2. Redesigning LSMs for Nonvolatile Memory with NoveLSM----USENIX ATC'18 (link) summary
  3. SLM-DB: Single-Level Key-Value Store with Persistent Memory----FAST'19 (link) summary
  4. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory----FAST'20 (link)
  5. Characterizing the Performance of Intel Optane Persistent Memory: A Close Look at its on-dimm Buffering----EuroSys'22 (link)

Data Structure

  1. An Introduction to Be-trees and Write-Optimization----USENIX Login'15 (link) code
  2. Building Workload-Independent Storage with VT-Trees----FAST'13 (link)

Benchmark

  1. SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks----FAST'15 (link)

I/O Optimizations

  1. BPF for Storage: An Exokernel-Inspired Approach----HotOS'21 (link) summary
  2. Understanding Modern Storage APIs: A systematic study of libaio, SPDK, and io_uring----SYSTOR'22 (link)
  3. PAIO: General, Portable I/O Optimizations With Minor Application Modifications----FAST'22 (link)
  4. zIO: Accelerating IO-Intensive Applications with Transparent Zero-Copy IO----OSDI'22 (link)
  5. XRP: In-Kernel Storage Functions with eBPF----OSDI'22 (link)
  6. HintStor: A Framework to Study I/O Hints in Heterogeneous Storage----ACM ToS'22 (link)

Deployed Systems

  1. The Google File System----SOSP'03 (link)
  2. Bigtable: A Distributed Storage System for Structured Data----OSDI'06 (link)
  3. Finding A Needle in Haystack: Facebook’s Photo Storage----OSDI'10 (link)
  4. f4: Facebook’s Warm BLOB Storage System----OSDI'14 (link)
  5. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service----USENIX ATC'22 (link)
  6. CacheSack: Admission Optimization for Google Datacenter Flash Caches----USENIX ATC'22 (link)
  7. From Luna to Solar: The Evolutions of the Compute-to-Storage Networks in Alibaba Cloud----SIGCOMM'22 (link)

CXL

  1. Hello Bytes, Bye Blocks: PCIe Storage Meets Compute Express Link for Memory Expansion (CXL-SSD)----HotStorage'22 (link)

Failures

  1. Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems----FAST'18 (link)
  2. Metastable Failures in Distributed Systems----HotOS'21 (link)
  3. Metastable Failures in the Wild----OSDI'22 (link)

Ceph Related Research

  1. Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution----IPDPS'04 (link)
  2. Dynamic Metadata Management for Petabyte-scale File Systems----SC'04 (link)
  3. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data----SC'06 (link)
  4. Ceph: A Scalable, High-performance Distributed File System----OSDI'06 (link) (slides)
  5. The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device----MSST'06 (link)
  6. Mantle: A Programmable Metadata Load Balancer for the Ceph File System----SC'15 (link)
  7. Understanding Write Behaviors of Storage Backends in Ceph Object Store----MSST'17 (link) slides
  8. Design of Global Data Deduplication for A Scale-out Distributed Storage System----ICDCS'18 (link)
  9. File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution----SOSP'19 (link) summary
  10. MAPX: Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems----FAST'20 (link)
  11. Lunule: An Agile and Judicious Metadata Load Balancer for CephFS----SC'21 (link)
  12. Speculative Recovery: Cheap, Highly Available Fault Tolerance with Disaggregated Storage----USENIX ATC‘22 (link)
  13. InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems----FAST'22 (link)
  14. TiDedup: A New Distributed Deduplication Architecture for Ceph----USENIX ATC'23 (link)

HPC Storage

  1. GPFS: A Shared-Disk File System for Large Computing Clusters----FAST'02 (link)
  2. Efficient Object Storage Journaling in a Distributed Parallel File System----FAST'10 (link)
  3. Tips and Tricks for Diagnosing Lustre Problems on Cray Systems----CUG'11 (link)
  4. Lustre Resiliency: Understanding Lustre Message Loss and Tuning for Resiliency----CUG'15 (link)
  5. Taking back control of HPC file systems with Robinhood Policy Engine----arxiv'15 (link)
  6. Lustre Lockahead: Early Experience and Performance using Optimized Locking----CUG'17 (link)
  7. LPCC: Hierarchical Persistent Client Caching for Lustre----SC'19 (link) slides
  8. A Performance Study of Lustre File System Checker: Bottlenecks and Potentials----MSST'19 (link)
  9. I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning----ICPP'19 (link)
  10. HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers----FAST'23 (link)
  11. Accelerating I/O performance of ZFS-based Lustre file system in HPC environment----Journal of Supercomputing'23 (link)
  12. MetaWBC: POSIX-compliant Metadata Write-back Caching for Distributed File Systems----SC'22 (link)
  13. Xfast: Extreme File Attribute Stat Acceleration for Lustre----SC'23 (link) slides
  14. The I/O Trace Initiative: Building a Collaborative I/O Archive to Advance HPC----SC-workshop'23 (link)
  15. Combining Buffered I/O and Direct I/O in Distributed File Systems----FAST'24 (link) slides summary

File System

File Fragmentation

  1. The Effects of Filesystem Fragmentation----OLS'06 (link)
  2. Ext4 Block and Inode Allocator Improvements----OLS'08 (link)
  3. File Systems Fated for Senescence? Nonsense, Says Science!----FAST'17 (link)
  4. Filesystem Aging: It's more Usage than Fullness----HotStorage'19 (link)

File System Analysis

  1. Understanding Configuration Dependencies of File Systems----HotStorage'22 (link)
  2. CONFD: Analyzing Configuration Dependencies of File Systems for Fun and Profit----FAST'24 (link)

Journaling

  1. Journaling of Journal Is (Almost) Free----FAST'14 (link)
  2. iJournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call----USENIX ATC'17 (link)
  3. FastCommit: Resource-efficient, Performant and Cost-effective File System Journaling----USENIX ATC'24 (link)

Page Cache

  1. StreamCache: Revisiting Page Cache for File Scanning on Fast Storage Devices----USENIX ATC'24 (link)

System Design

  1. The Linear Tape File System----MSST'10 (link)
  2. Scale and Concurrency of GIGA+: File System Directories with Millions of Files----FAST''11 (link)
  3. F2FS: A New File System for Flash Storage----FAST'15 (link)
  4. POSIX is Dead! Long Live... errr... What Exactly?----HotStorage'15 (link)
  5. BetrFS: A Right-Optimized Write-Optimized File System----FAST'15 (link)
  6. The Full Path to Full-Path Indexing----FAST'18 (link)
  7. SplitFS: persistent-memory file system that reduces software overhead----SOSP'19 (link)
  8. EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices----USENIX ATC'19 (link)
  9. How to Copy Files----FAST'20 (link)
  10. WineFS: a hugepage-aware file system for persistent memory that ages gracefully----SOSP'21 (link)
  11. LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism----SOSP'21 (link)
  12. BetrFS: A Compleat File System for Commodity SSDs----EuroSys'22 (link)

FUSE

  1. To FUSE or Not to FUSE: Performance of User-Space File Systems----FAST'17 (link)
  2. Performance and Resource Utilization of FUSE User-Space File Systems----ACM TOS'19 (link)
  3. XFUSE: An Infrastructure for Running Filesystem Services in User Space----USENIX ATC'21 (link)

Survey

  1. Survey of Distributed File System Design Choices----ACM TOS'22 (link)

Storage + AI

LLM in Storage

  1. Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?----HotStorage'24 (link)