大数据经典论文
- One SQL to Rule Them All: An Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- Monitoring Streams – A New Class of Data Management Applications
- Exploiting Punctuation Semantics in Continuous Data Streams
- STREAM: The Stanford Data Stream Management System
- The 8 Requirements of Real-Time Stream Processing
- The Design of the Borealis Stream Processing Engine
- High-Availability Algorithms for Distributed Stream Processing
- A Cooperative, Self-Configuring High-Availability Solution for Stream Processing∗
- Out-of-Order Processing: A New Architecture for HighPerformance Stream Systems
- Fast and Highly-Available Stream Processing over Wide Area Networks
- S4: Distributed Stream Computing Platform
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale
- Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management
- Trill: A High-Performance Incremental Query Processor for Diverse Analytics
- Summingbird: A Framework for Integrating Batch and Online MapReduce Computations
- Drizzle: Fast and Adaptable Stream Processing at Scale
- Realtime Data Processing at Facebook
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Spark: Cluster Computing with Working Sets
- Apache Flink™: Stream and Batch Processing in a Single Engine
- Lightweight Asynchronous Snapshots for Distributed Dataflows
- State Management in Apache Flink
- MapReduce: Simplified Data Processing on Large Clusters
- Bigtable: A Distributed Storage System for Structured Data
- SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures