/airflow

Airflow setup with Kubernetes.

Primary LanguageMakefile

Sample Airflow setup with Kubernetes

Apache Airflow sample setup with Kubernetes.

Spins up a local Kubernetes cluster for Airflow with Kind:

This is meant as a sample local setup. In order to run it in a production environment, please refer to the Airflow Helm chart production guide.

Requirements

Kind, Docker and Helm for local Kubernetes cluster.

Instructions

The repo includes a Makefile. You can run make help to see usage.

Basic setup:

  • Run make k8s-cluster-up to spin up local Kubernetes cluster with Kind.
  • Run make airflow-k8s-add-helm-chart to add the official Airflow Helm chart to the local repo.
  • Run make airflow-k8s-create-namespace to create a namespace for the Airflow deployment.
  • Run make airflow-k8s-up to deploy Airflow on the local Kubernetes cluster.
  • On a separate terminal, run make airflow-webserver-port-forward to be able to access the Airflow webserver on http://localhost:8080.

The credentials for the webserver are admin/admin.

Configuration

If you need to customize Airflow configuration you can edit the config section in values.yaml.

Also, environment variables can be added in the env section (they will be present in all the pods).

The complete values.yaml of the source Helm chart can be seen here.

DAG deployment

DAGs are deployed via GitSync.

GitSync acts as a side car container alongside the other Airflow pods, synchronising the dags/ folder in the pods with the DAGs located in a Git repo of your choice (in this case https://github.com/guidok91/airflow/tree/master/dags).

Custom Docker image for pods

A custom Docker image is provided for the pods. Here we can install the Airflow dependencies we need.

Logs

So that Airflow logs don't get lost every time a task finishes (e.g. the pod gets deleted), the setup provides a PersistentVolume that shares the logs with the host system in the data/ folder.