/awesome-papers

A collection of awesome papers about big data & cloud computing like Spark,Ceph,Kubernetes,inc.

MIT LicenseMIT

Ceph

  1. Ceph: Reliable, Scalable, and High-performance Distributed Storage
  2. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data

Spark

  1. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

  2. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Kafka

  1. Kafka: a Distributed Messaging System for Log Processing

Kubernetes

  1. Large-scale cluster management at Google with Borg

Mesos

  1. A Common Substrate for Cluster Computing

  2. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Google

  1. Large-scale cluster management at Google with Borg
  2. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
  3. Borg, Omega, and Kubernetes(This article describes some of the knowledge gained and lessons learned during Google’s journey from Borg to Kubernetes. )
  4. The Google File System
  5. Bigtable: A Distributed Storage System for Structured Data
  6. MapReduce: Simplified Data Processing on Large Clusters
  7. Dremel: Interactive Analysis of Web-Scale Datasets
  8. Pregel: A System for Large-Scale Graph Processing
  9. Large-scale Incremental Processing Using Distributed Transactions and Notifications(One of the backend systems that subtend Caffeine)
  10. Similarity Estimation Techniques from Rounding Algorithms

Algorithms

  1. Similarity Estimation Techniques from Rounding Algorithms(Simhash)
  2. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
  3. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web(Consistent hashing)
  4. The Part-Time Parliament(Paxos)
  5. Paxos Made Simple