Data & Systems Engineering White Papers

These are some of the better white papers I've come across. If you are new to reading white papers, I highly recommend reading How to Read a Paper.

Data

Cassandra - A Decentralized Structured Storage System
Dynamo: Amazon’s Highly Available Key-value Store
Bigtable: A Distributed Storage System for Structured Data
Hive – A Petabyte Scale Data Warehouse Using Hadoop
Kafka: a Distributed Messaging System for Log Processing
MapReduce: Simplified Data Processing on Large Clusters
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
Scaling Memcache at Facebook

Systems

The Anatomy of a Large-Scale Hypertextual Web Search Engine
The Chubby lock service for loosely-coupled distributed systems
The Google File System
TAO: Facebook’s Distributed Data Store for the Social Graph
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm
Unicorn: A System for Searching the Social Graph
Websearch for a Planet: The Google Cluster Architecture
ZooKeeper: Wait-free coordination for Internet-scale systems
Thrift: Scalable Cross-Language Services Implementation