A collection of academic papers, articles, and other resources that I plan to read or have read. The content has a focus on distributed systems.
I have also included my notes on select resources to summarize important takeaways and to help me better understand the material.
- Consistency Models
- Jepsen Analyses
- Strong Consistency Models
- Readings in Distributed Systems
- The Log: What every software engineering should know about real-time data's unifying abstraction
- Unreliable Failure Detectors for Reliable Distributed Systems
Chandra, Tushar Deepak, and Sam Toueg. 1996. “Unreliable Failure Detectors for Reliable Distributed Systems.” Journal of the ACM (JACM) 43 (2). ACM: 225–67.
- Impossibility of Distributed Consensus with One Faulty Process
Fischer, Michael J, Nancy A Lynch, and Michael S Paterson. 1982. “Impossibility of Distributed Consensus with One Faulty Process.” MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE.
- Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
Oki, Brian M, and Barbara H Liskov. 1988. “Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems.” In Proceedings of the Seventh Annual Acm Symposium on Principles of Distributed Computing, 8–17. ACM.
- Viewstamped Replication Revisited
Liskov, Barbara, and James Cowling. 2012. “Viewstamped Replication Revisited.”
- The Part Time Parliament
Lamport, Leslie, and others. 1998. “The Part-Time Parliament.” ACM Transactions on Computer Systems 16 (2): 133–69.
- Paxos Made Simple
Lamport, Leslie, and others. 2001. “Paxos Made Simple.” ACM Sigact News 32 (4): 18–25.
- Paxos Made Live - An Engineering Perspective
Chandra, Tushar D, Robert Griesemer, and Joshua Redstone. 2007. “Paxos Made Live: An Engineering Perspective.” In Proceedings of the Twenty-Sixth Annual Acm Symposium on Principles of Distributed Computing, 398–407. ACM.
- Flexible Paxos: Quorum Intersection Revisited
Howard, Heidi, Dahlia Malkhi, and Alexander Spiegelman. 2016. “Flexible Paxos: Quorum Intersection Revisited.” arXiv Preprint arXiv:1608.06696.
- Zab: High-Performance Broadcast for Primary-Backup Systems
Junqueira, Flavio P, Benjamin C Reed, and Marco Serafini. 2011. “Zab: High-Performance Broadcast for Primary-Backup Systems.” In 2011 Ieee/Ifip 41st International Conference on Dependable Systems & Networks (Dsn), 245–56. IEEE.
- ZooKeeper's Atomic Broadcast Protocol: Theory and Practice
Medeiros, André. 2012. “ZooKeeper’s Atomic Broadcast Protocol: Theory and Practice.” Aalto University School of Science 20.
- Vive la Différence: Paxos vs. Viewstamped Replication vs. Zab
Van Renesse, Robbert, Nicolas Schiper, and Fred B Schneider. 2015. “Vive La Différence: Paxos Vs. Viewstamped Replication Vs. Zab.” IEEE Transactions on Dependable and Secure Computing 12 (4). IEEE: 472–84.
- In Search of an Understandable Consensus Algorithm
Ongaro, Diego, and John Ousterhout. 2014. “In Search of an Understandable Consensus Algorithm.” In 2014 {Usenix} Annual Technical Conference ({Usenix}{ATC} 14), 305–19.
- Consensus: Bridging Theory and Practice
Ongaro, Diego. 2014. “Consensus: Bridging Theory and Practice.” PhD thesis, Stanford University.
- Raft Refloated: Do We Have Consensus?
Howard, Heidi, Malte Schwarzkopf, Anil Madhavapeddy, and Jon Crowcroft. 2015. “Raft Refloated: Do We Have Consensus?” Operating Systems Review 49 (1): 12–21.
- Time, Clocks, and the Ordering of Events in a Distributed System
Lamport, Leslie. 1978. “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM 21 (7). ACM: 558–65.
- Timestamps in Message-Passing Systems That Preserve the Partial Ordering
Fidge, Colin J. 1987. Timestamps in Message-Passing Systems That Preserve the Partial Ordering. Australian National University. Department of Computer Science.
- Virtual Time and Global States of Distributed Systems
Mattern, Friedemann, and others. 1988. Virtual Time and Global States of Distributed Systems. Citeseer.
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases
Demirbas, Murat, Marcelo Leone, Bharadwaj Avva, Deepak Madeppa, and Sandeep Kulkarni. 2014. “Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases.”
- The Bloom Clock
Ramabaja, Lum. 2019. “The Bloom Clock.” CoRR abs/1905.13064. http://arxiv.org/abs/1905.13064.
- Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
Gilbert, Seth, and Nancy Lynch. 2002. “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services.” Acm Sigact News 33 (2). ACM: 51–59.
- Fast set operations using treaps
Blelloch, Guy E, and Margaret Reid-Miller. 1998. “Fast Set Operations Using Treaps.” In SPAA, 98:16–26.
- A Skip List Cookbook
Pugh, William. 1998. “A Skip List Cookbook.”
- Skip Lists: A Probabilistic Alternative to Balanced Trees
Pugh, William. 1990. “Skip Lists: A Probabilistic Alternative to Balanced Trees.” Communications of the ACM 33 (6).
- Less hashing, same performance: Building a better Bloom filter
Kirsch, Adam, and Michael Mitzenmacher. 2006. “Less Hashing, Same Performance: Building a Better Bloom Filter.” In European Symposium on Algorithms, 456–67. Springer.
- Advanced bloom filter based algorithms for efficient approximate data de-duplication in streams
Bera, Suman K, Sourav Dutta, Ankur Narang, and Souvik Bhattacherjee. 2012. “Advanced Bloom Filter Based Algorithms for Efficient Approximate Data de-Duplication in Streams.” arXiv Preprint arXiv:1212.3964.
- Cuckoo filter: Practically better than bloom
Fan, Bin, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. “Cuckoo Filter: Practically Better Than Bloom.” In Proceedings of the 10th Acm International on Conference on Emerging Networking Experiments and Technologies, 75–88. ACM.
- Don't thrash: how to cache your hash on flash
Bender, Michael A, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P Spillane, and Erez Zadok. 2012. “Don’t Thrash: How to Cache Your Hash on Flash.” Proceedings of the VLDB Endowment 5 (11). VLDB Endowment: 1627–37.
- An improved data stream summary: the count-min sketch and its applications
Cormode, Graham, and Shan Muthukrishnan. 2005. “An Improved Data Stream Summary: The Count-Min Sketch and Its Applications.” Journal of Algorithms 55 (1). Elsevier: 58–75.
- A general-purpose counting filter: Making every bit count
Pandey, Prashant, Michael A Bender, Rob Johnson, and Rob Patro. 2017. “A General-Purpose Counting Filter: Making Every Bit Count.” In Proceedings of the 2017 Acm International Conference on Management of Data, 775–87. ACM.
- HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. “Hyperloglog: The Analysis of a Near-Optimal Cardinality Estimation Algorithm.” In Discrete Mathematics and Theoretical Computer Science, 137–56. Discrete Mathematics; Theoretical Computer Science.
- HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm
Heule, Stefan, Marc Nunkesser, and Alexander Hall. 2013. “HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm.” In Proceedings of the 16th International Conference on Extending Database Technology, 683–92. ACM.
- LSM Tree
O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. “The Log-Structured Merge-Tree (Lsm-Tree).” Acta Informatica 33 (4). Springer: 351–85.
- bLSM: A General Purpose LSM Tree
Sears, Russell, and Raghu Ramakrishnan. 2012. “BLSM: A General Purpose Log Structured Merge Tree.” In Proceedings of the 2012 Acm Sigmod International Conference on Management of Data, 217–28. ACM.
- A Comprehensive Study of Convergent and Commutative Replicated Data Types
Shapiro, Marc, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. “A Comprehensive Study of Convergent and Commutative Replicated Data Types.” PhD thesis, Inria–Centre Paris-Rocquencourt; INRIA.
- Efficient Synchronization of State-based CRDTs
Enes, Vitor, Paulo Sérgio Almeida, Carlos Baquero, and João Leitão. 2018. “Efficient Synchronization of State-Based Crdts.” arXiv Preprint arXiv:1803.02750.
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
Stoica, Ion, Robert Morris, David Liben-Nowell, David R Karger, M Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. 2003. “Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications.” IEEE/ACM Transactions on Networking (TON) 11 (1). IEEE Press: 17–32.
- Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Maymounkov, Petar, and David Mazieres. 2002. “Kademlia: A Peer-to-Peer Information System Based on the Xor Metric.” In International Workshop on Peer-to-Peer Systems, 53–65. Springer.
- Dynamo: Amazon's Highly Available Key-value Store
DeCandia, Giuseppe, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. “Dynamo: Amazon’s Highly Available Key-Value Store.” In ACM Sigops Operating Systems Review, 41:205–20. 6. ACM.
- Bigtable: A Distributed Storage System for Structured Data
Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. “Bigtable: A Distributed Storage System for Structured Data.” ACM Transactions on Computer Systems (TOCS) 26 (2). ACM: 4.
- Cassandra - A Decentralized Structured Storage System
Lakshman, Avinash, and Prashant Malik. 2010. “Cassandra: A Decentralized Structured Storage System.” ACM SIGOPS Operating Systems Review 44 (2). ACM: 35–40.
- Kafka: A Distributed Messaging System for Log Processing
Kreps, Jay, Neha Narkhede, Jun Rao, and others. 2011. “Kafka: A Distributed Messaging System for Log Processing.” In Proceedings of the Netdb, 1–7.
- Readings in Database Systems
- Spanner: Google's Globally-Distributed Database
Corbett, James C., Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, et al. 2012. “Spanner: Google’s Globally-Distributed Database.” In OSDI.
- Spanner: Becoming a SQL System
Bacon, David F., Nathan Bales, Nico Bruno, Brian F. Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, et al. 2017. “Spanner: Becoming a Sql System.” In Proc. SIGMOD 2017, 331–43.
- An Analysis of Network-Partitioning Failures in Cloud Systems
Alquraan, Ahmed, Hatem Takruri, Mohammed Alfatafta, and Samer Al-Kiswany. 2018. “An Analysis of Network-Partitioning Failures in Cloud Systems.” In 13th Usenix Symposium on Operating Systems Design and Implementation Osdi 18), 51–68.
- Why Is Random Testing Effective for Partition Tolerance Bugs?
Majumdar, Rupak, and Filip Niksic. 2017. “Why Is Random Testing Effective for Partition Tolerance Bugs?” Proceedings of the ACM on Programming Languages 2 (POPL). ACM: 46.