/awesome-beam

A curated list of awesome resources for Apache Beam

The UnlicenseUnlicense

Awesome Beam Awesome

A curated directory of awesome things related to Apache Beam. Inspired by Awesome Flink and Awesome Hadoop.

Packages

Beam Wrappers

Transforms

  • (Pending)

Notebooks

  • Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, plotting, etc.

Machine Learning

  • Tensorflow Transform is a library for preprocessing data with TensorFlow. It uses Beam, and thus it brings the portability aspect of Beam (i.e. run in any supported runner).

Tests

  • (Pending)

Resources

Various resources, such as books, websites and articles.

Official Resources

Community

Books

Courses

Papers

Blogs and Blog Posts

Talks

  • So you want to write a Beam SDK? Talk by Robert Bradshaw about the pieces of an SDK and the runner API [slides]
  • Robust, performant and modular APIs for data ingestion with Apache Beam - Eugene Kirpichov, Ismael Mejia [slides] - Important talk about IO, and what we think is the future of IO for Big Data systems.
  • SplittableDoFn - A Transform Developer's perspective. Alex Van Boxel. [slides].
  • Large Scale Landuse Classification of Satellite Imagery - Suneel Marthi [slides] [code] - Excellent talk using Beam's Python SDK to run machine learning over a dataset of images.
  • Beam me up, Samza! - The Beam runner for Samza - Xinyu Liu [slides].
  • Python Streaming Pipelines with Beam on Flink - Aljoscha Krettek, Thomas Weise [slides]. - A talk about how Beam enables Python pipelines to run on top of Flink.
  • Spark Runner (R)evolution - David Moravek, Ismaël Mejía [slides] - A talk about Spark runner implementation, performance improvements and roadmap.