- Haystack (https://lnkd.in/gSZYcmmB)
- f4: Facebook’s Warm BLOB Storage System (https://lnkd.in/gMEfTpAh)
- The Hadoop Distributed File System (https://lnkd.in/gSUqafDg)
- The Google File System (https://lnkd.in/giUResea)
- Facebook's Tectonic Filesystem: Efficiency from Exascale (https://lnkd.in/geg7-ub9)
- Pelican: A Building Block for Exascale Cold Data Storage (https://lnkd.in/gSse26YK)
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data (https://lnkd.in/gUbnK4rH)
- RADOS: a scalable, reliable storage service for petabyte-scale storage (https://lnkd.in/gKwbmzTx)
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services (https://lnkd.in/gT7mSDQN)
- The Design and Implementation of a Log-Structured File System (https://lnkd.in/gVuka_Ym)
- The RAMCloud Storage System (https://lnkd.in/gC3SQccF)
- Monarch: Google's Planet-Scale In-Memory Time Series Database (https://lnkd.in/gbqa7HNa)
- Gorilla: A Fast, Scalable, In-Memory Time Series Database (https://lnkd.in/gd_nUJbu)
- Scuba: Diving into Data at Facebook (https://lnkd.in/gfBrJcge)
- The Unified Logging Infrastructure for Data Analytics at Twitter (https://lnkd.in/gwhNUMnF)
- Cubrick: Indexing Millions of Records per Second for Interactive Analytics (https://lnkd.in/g-n9GUMD)
- Shark: SQL and Rich Analytics at Scale (https://lnkd.in/gqXHq5BG)
- Realtime Data Processing at Facebook (https://lnkd.in/gQdMN4kP)
-
Large-scale cluster management at Google with Borg (https://lnkd.in/gT7bG2SF)
-
Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing (https://lnkd.in/gEEdRmcD)
-
Apache Hadoop YARN: Yet Another Resource Negotiator (https://lnkd.in/g9SVx_Ft)
-
Twine: A Unified Cluster Management System for Shared Infrastructure (https://lnkd.in/gbnuqutm)
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale (https://lnkd.in/gC7VjCfG)
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing (https://lnkd.in/g-PyJUPa)
- Apache Flink™: Stream and Batch Processing in a Single Engine (https://lnkd.in/gpzRA6v3)
- Drizzle: Fast and Adaptable Stream Processing at Scale (https://lnkd.in/g9Hbnvp7)
- Kafka, Samza and the Unix Philosophy of Distributed Data (https://lnkd.in/grtHkFWN)
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale (https://lnkd.in/gbzc3_Ke)
- Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark (https://lnkd.in/gnQQP2UY)
- Noria: dynamic, partially-stateful data-flow for high-performance web applications (https://lnkd.in/gYtpef34)
- Kafka: a Distributed Messaging System for Log Processing (https://lnkd.in/dkfPsFwH)
- Scribe: Transporting petabytes per hour via a distributed, buffered queueing system (https://lnkd.in/dTyTBE_t)
- LogDevice: a distributed data store for logs (https://lnkd.in/dvVTBz46)
- Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log (https://lnkd.in/d7xmexrQ)
- CORFU: A Shared Log Design for Flash Clusters (https://lnkd.in/dxiquk5h)
- The FuzzyLog: A Partially Ordered Shared Log (https://lnkd.in/da4ikmEa)
- Ubiq: A Scalable and Fault-tolerant Log Processing Infrastructure (https://lnkd.in/dQTfCDwH)
- Pregel: A System for Large-Scale Graph Processing (https://lnkd.in/ggpew7yq)
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (https://lnkd.in/g6f9Mjzk)
- GraphX: Graph Processing in a Distributed Dataflow Framework (https://lnkd.in/gixUZP46)
- Gemini: A Computation-Centric Distributed Graph Processing System (https://lnkd.in/gCs2R5EJ)
- TAO: Facebook’s Distributed Data Store for the Social Graph (https://lnkd.in/gfesm_Hn)
- Paxos Made Simple (https://lnkd.in/gk6nxyVj)
- Implementing Fault-Tolerant Services Using the State Machine (https://lnkd.in/gPwNde-i)
- The Chubby lock service for loosely-coupled distributed systems (https://lnkd.in/gFXKTrXR)
- ZooKeeper: Wait-free coordination for Internet-scale systems (https://lnkd.in/gWTYBxQN)
- In Search of an Understandable Consensus Algorithm (https://lnkd.in/gqrKhvsK)
- Virtual Consensus in Delos (https://lnkd.in/g5bitkdM)
- Gossip-Based Broadcast (https://lnkd.in/gT74Zb8Z)
- Gossiping in Distributed Systems (https://lnkd.in/g55DFbuP)
- Peer-to-peer membership management for gossip-based protocols (https://lnkd.in/g_XE4TiE)
- Gossip-based Peer Sampling (https://lnkd.in/gSPwEkaW)
- SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (https://lnkd.in/gxZtR3Nh)
- Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (https://lnkd.in/gyURBizm)
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications (https://lnkd.in/grVF9crk)
Additional May be Repeated articles will categorize later.
Short Name | Title | Link | Extra links | |
---|---|---|---|---|
1 | Apache Kafka | Kafka: A Distributed Messaging System for Log Processing | (https://notes.stephenholiday.com/Kafka.pdf) | |
2 | Apache Cassandra | Cassandra - A Decentralized Structured Storage System | (https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf) | |
3 | Apache Flink | Apache Flink: Stream and Batch Processing in a Single Engine | (https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf) | |
4 | Apache Spark | Spark: Cluster Computing with Working Sets | (https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf) | |
5 | Apache Zookeeper | ZooKeeper: Wait-free coordination for Internet-scale systems | (https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf) | |
6 | BigTable | Bigtable: A Distributed Storage System for Structured Data | (https://research.google.com/archive/bigtable-osdi06.pdf) | |
8 | Apache Impala | Apache Impala: A Modern, Open-Source SQL Engine for Hadoop | (https://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) | |
9 | Apache Druid | Druid: A Real-time Analytical Data Store | (http://static.druid.io/docs/druid.pdf) | |
10 | Timer Wheel | Hashed and Hierarchical Timing Wheels | (http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf) | |
11 | MillWheel | MillWheel: Fault-Tolerant Stream Processing at Internet Scale | (https://research.google.com/pubs/archive/41378.pdf) | |
12 | Dynamo | Dynamo: Amazon’s Highly Available Key-value Store | (https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) | |
13 | Google File System | The Google File System | (https://research.google.com/archive/gfs-sosp2003.pdf) | |
14 | MapReduce | MapReduce: Simplified Data Processing on Large Clusters | (https://research.google.com/archive/gfs-sosp2003.pdf) | |
15 | Spanner | Spanner: Google’s Globally-Distributed Database | (https://research.google.com/archive/spanner-osdi2012.pdf) | |
16 | Zab | Zab: High-performance broadcast forprimary-backup systems | (http://www.cs.cornell.edu/courses/cs6452/2012sp/papers/zab-ieee.pdf) | |
17 | Paxos | Paxos Made Simple | (https://lamport.azurewebsites.net/pubs/paxos-simple.pdf) | |
18 | Chubby | The Chubby lock service for loosely-coupled distributed systems | (https://research.google.com/archive/chubby-osdi06.pdf) | |
19 | Dremel | Dremel: Interactive Analysis of Web-Scale Datasets | (https://research.google/pubs/pub36632/) | |
20 | Megastore | Megastore:Providing Scalable, Highly Available Storage for Interactive Services | (https://research.google/pubs/pub36971.pdf) | |
21 | Raft | In Search of an Understandable Consensus Algorithm (Extended Version) | (https://raft.github.io/raft.pdf) | |
22 | Flexible Paxos | Flexible Paxos: Quorum Intersection Revisited | (https://arxiv.org/abs/1608.06696) | |
23 | Thrift | Thrift: Scalable Cross-Language Services Implementation | (https://thrift.apache.org/static/files/thrift-20070401.pdf) | |
24 | Maglev | Maglev: A Fast and Reliable Software Network Load Balancer | (https://research.google.com/pubs/archive/44824.pdf) | |
25 | LSM | The Log-Structured Merge-Tree (LSM-Tree) | (https://www.cs.umb.edu/~poneil/lsmtree.pdf) | |
26 | Chord | Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications | (https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf) | |
27 | Kademlia | Kademlia: A Peer-to-peer Information System Based on the XOR Metric | (https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf) | |
28 | Mesa | Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing | (https://research.google/pubs/pub42851/ ) | |
29 | SCRIBE | SCRIBE: A large-scale and decentralized application-level multicast infrastructure | https://rowstron.azurewebsites.net/PAST/jsac.pdf | |
30 | PAST | Storage management and caching in PAST- A large-scale, persistent peer-to-peer storage utility | https://people.mpi-sws.org/~druschel/publications/PAST-hotos.pdf | |
31 | Pastry | Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems? | https://www.cs.cornell.edu/people/egs/615/pastry.pdf | |
32 | Linearizability | Linearizability: A Correctness Condition for Concurrent Objects | http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf | |
33 | Time and Clocks | Time, Clocks, and the Ordering of Events in a Distributed System | http://lamport.azurewebsites.net/pubs/time-clocks.pdf | |
34 | CRDTs | CRDTs: Consistency without concurrency control | http://hal.archives-ouvertes.fr/docs/00/39/79/81/PDF/RR-6956.pdf | |
35 | Photon | Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams | https://research.google/pubs/pub41318/ | |
36 | TAO | TAO: Facebook’s Distributed Data Store for the Social Graph | https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf | |
37 | Pregel | Pregel: A System for Large-Scale Graph Processing | https://15799.courses.cs.cmu.edu/fall2013/static/papers/p135-malewicz.pdf | |
38 | Dapper | Dapper: A-large-scale-distributed-tracing-infrastructure | https://research.google/pubs/pub36356.pdf | |
39 | Raft Refloated | Raft Refloated: Do We Have Consensus? | https://www.cl.cam.ac.uk/~ms705/pub/papers/2015-osr-raft.pdf | |
40 | Percolator | Large-scale Incremental Processing Using Distributed Transactions and Notifications | https://research.google/pubs/pub36726.pdf | |
41 | Monarch | Monarch: Google’s Planet-Scale In-Memory Time Series Database | https://research.google/pubs/pub50652/ | |
42 | Borg | Large-scale cluster management at Google with Borg | https://research.google/pubs/pub43438.pdf | |
43 | Borg - Next | Borg: the Next Generation | https://research.google/pubs/pub49065.pdf | |
44 | Amazon Aurora | Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases | https://web.stanford.edu/class/cs245/readings/aurora.pdf | |
45 | Gorilla | Gorilla: A Fast, Scalable, In-Memory Time Series Database | http://www.vldb.org/pvldb/vol8/p1816-teller.pdf | |
46 | HDFS | The Hadoop Distributed File System | https://storageconference.us/2010/Papers/MSST/Shvachko.pdf | |
47 | Autopilot | Autopilot: workload autoscaling at Google | https://dl.acm.org/doi/10.1145/3342195.3387524 | |
48 | Consistent hashing | Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web | https://dl.acm.org/doi/pdf/10.1145/258533.258660 | |
49 | SEDA | SEDA: An Architecture for Well-Conditioned, Scalable Internet Services | http://www.sosp.org/2001/papers/welsh.pdf | |
50 | Bitcask | Bitcask: A Log-Structured Hash Table for Fast Key/Value Data | https://riak.com/assets/bitcask-intro.pdf | |
51 | DynamoDB | Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service | https://www.usenix.org/system/files/atc22-elhemali.pdf | |
52 | Isolation levels | A critique of ANSI SQL isolation levels | https://dl.acm.org/doi/pdf/10.1145/223784.223785 | |
54 | Deletable Bloom Filter | The deletable bloom filter | https://arxiv.org/pdf/1005.0352 | |
55 | Hash Coding | Space\Time Trade-offs in Hash Coding with Allowable Errors | https://dl.acm.org/doi/pdf/10.1145/362686.362692 | |
56 | Expedite Byzantine | Shifting Gears- Changing Algorithms on the Fly To Expedite Byzantine Agreement | https://www.sciencedirect.com/science/article/pii/089054019290035E | |
57 | Scalability cost | Scalability! But at what COST? | https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf | |
58 | Foundation DB | FoundationDB: A Distributed Unbundled Transactional Key Value Store | https://www.foundationdb.org/files/fdb-paper.pdf | |
59 | Monolith | Monolith: Real Time Recommendation System With Collisionless Embedding Table | https://arxiv.org/pdf/2209.07663 | |
60 | Memcache at Facebook | Scaling Memcache at Facebook | https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf | |
61 | MilliSampler | A microscopic view of bursts, buffer contention, and loss in data centers | https://dl.acm.org/doi/pdf/10.1145/3517745.3561430 | https://engineering.fb.com/2023/04/17/networking-traffic/millisampler-network-traffic-analysis/ |
62 | FlexiRaft | FlexiRaft: Flexible Quorums with Raft | https://www.cidrdb.org/cidr2023/papers/p83-yadav.pdf | |
63 | Minesweeper | Scalable Statistical Root Cause Analysis on AppTelemetry | https://arxiv.org/abs/2010.09974 | |
64 | Shard Manager | Shard Manager: A Generic Shard ManagementFramework for Geo-distributed Applications | ||
65 | FlumeJava | FlumeJava: Easy, Efficient Data-Parallel Pipelines | https://research.google/pubs/pub35650.pdf | |
66 | Heron | Twitter Heron: Stream Processing at Scale | https://dl.acm.org/doi/pdf/10.1145/2723372.2742788 | |
67 | Dataflow | The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing | https://research.google/pubs/pub43864.pdf | |
68 | Flink | State Management in Apache Flink | http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf | |
69 | Dgraph | Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database |