A curated directory of awesome things related to Apache Beam. Inspired by Awesome Flink and Awesome Hadoop.
- Apache Beam in Kotlin to reduce boilerplate. Using Kotlin's special features to make Beam Java SDK less verbose!
- Scio - Scala wrapper for Apache Beam Wrap Beam functionality in a simple Scala API.
- (Pending)
- Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, plotting, etc.
- Tensorflow Transform is a library for preprocessing data with TensorFlow. It uses Beam, and thus it brings the portability aspect of Beam (i.e. run in any supported runner).
- (Pending)
Various resources, such as books, websites and articles.
- Error Handling Elements in Apache Beam Pipelines. A blog post detailing how to handle when individual elements have errors in their processing downstream.
- Beam Documentation
- Java SDK
- Python SDK
- Go SDK
- Beam Wiki
- Beam Quickstarts Java, Python, Go.
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - Paper introducing the Dataflow model, which was the predecesor to Beam. (2015)
- Official Beam Blog
- Streaming 101: The world beyond batch
- Streaming 102: The world beyond batch
- Python Development Environments for Beam on GCP - How to set up a development environment for Python Dataflow jobs.
- Java Development Environments for Beam on GCP - How to set up a development environment for Java Dataflow / Beam jobs.
- Coding Apache Beam in your Web Browser and Running it in Cloud Dataflow - How to create and run a Beam Pipeline on Dataflow using Code Editor.
- Realtime Data Processing with Apache Beam at Dailymotion
- So you want to write a Beam SDK? Talk by Robert Bradshaw about the pieces of an SDK and the runner API [slides]
- Robust, performant and modular APIs for data ingestion with Apache Beam - Eugene Kirpichov, Ismael Mejia [slides] - Important talk about IO, and what we think is the future of IO for Big Data systems.
- SplittableDoFn - A Transform Developer's perspective. Alex Van Boxel. [slides].
- Large Scale Landuse Classification of Satellite Imagery - Suneel Marthi [slides] [code] - Excellent talk using Beam's Python SDK to run machine learning over a dataset of images.
- Beam me up, Samza! - The Beam runner for Samza - Xinyu Liu [slides].
- Python Streaming Pipelines with Beam on Flink - Aljoscha Krettek, Thomas Weise [slides]. - A talk about how Beam enables Python pipelines to run on top of Flink.