/awesome-apache-airflow

Curated list of resources about Apache Airflow

Apache License 2.0Apache-2.0

Awesome Apache Airflow

This is a curated list of resources about Apache Airflow. Please feel free to contribute any items that should be included. Items are generally added at the top of each section so that more fresh items are featured more prominently.

Contents

Vital links

Airflow deployment solutions

Introductions and tutorials

Best practices, lessons learned and cool use cases

Blogs, etc.

  • The Airflow Podcast - A semiregular podcast discussing all things Airflow.
  • Maxime Beauchemin - Maxime's blog on medium that gives insight into the philosophy behind Apache Airflow.
  • Robert Chang - Blog posts about data engineering with Apache Airflow, explains why and has examples in code.

Slide deck presentations and online videos

Libraries, Hooks, Utilities

  • Airflow plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, github, etc.
  • fileflow - Collection of modules to support large data transfers between Airflow operators through either local file system or S3. This addresses a gap where data is too large for XCOMs but too small or inconvenient for loading directly in the operator. Built by Industry Dive.
  • fairflow - Library to abstract away Airflow's Operators with functional pieces that transform the data from one operator to another.
  • airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.
  • test_dags - a more complete solution for DAG integrity tests (first Circle of Data’s Inferno are the first.
  • dag-factory - a library for dynamically generating Apache Airflow DAGs from YAML configuration files.
  • whirl - Fast iterative local development and testing of Apache Airflow workflows

Meetups

Commercial Airflow-as-a-service providers

  • Google Cloud Composer - Google Cloud Composer is a managed service built atop Google Cloud and Airflow.
  • Qubole - Qubole is mainly known as a service-and-support company for Apache Hive, but also provides Airflow as a component of its platform.
  • Astronomer.io - Astronomer provides complete ETL lifecycle solutions and appears to be entirely focused on providing Airflow-based products.

Non-English resources