This repo review how to run Airflow locally with a few examples of DAGs
In General, Apache Airflow is a like a crontab on steroids. Its a pipeline framework that can be used for ETL processing, training models, or any task that needs to be run at certain frequency. The framework allows you to run multiple jobs across different workers and specify dependencies.
There are few options on how to deploy Airflow.
docker-compose: great for local development kubernetes / helm: best for a production use case Cloud Virtual Machine: Not the stable but a very cheap alternative to most $1000 per month options.
I use Makefile
a lot in my code to consolidate most of the docker-compose commands your need to run. The following make command will provision the database, webserver, and scheduler locally for you.
- make
- watch
- docker
The docker compose file for this quickstart is very similar to what is provided by the airflow team. I have made the following modifications
- added a volume mount for the script directory.
- Start the docker-compose containers.
# Start docker-compose via Makefile command.
make startup
The above command set up the postgres meta-datbase, webserver, scheduler, and workers. It may take a few moments for everything to initialize.
- Visit http://localhost:8080/ to interact with the webserver and the sample dags.
username: airflow
password: airflow
- Turn on the "setup_dag" to set up the airflow connections.
You can view running containers after startup.
# view the running containers
make containers
I remove all template example dags from Airflow local deployment, and loaded my own examples.
These are dags that I have written using best practises.
- python_operator: A dag that run python script.
- bash_dag: A dag that run bash script.
- branch_dag: A dag that branch based on a task output.
- datetime_dag: A dag that branch based on time.
- postgres_dag: A dag that loads data into postgres.
- custom_schedule_dag: A dag that runs on a predefined external schedule. (pending)
- dag_factory_dag: An example of a dag factory using yaml.
- templated dag: A dag that is template over multiple office locations.
- sla_dag: A dag with a defined SLA. (pending)
- sensor_dag: A dag that uses task sensors
- xcom_dag: A dag that use xcom to pass variables.
- pip_dag: A dag that export airflow pip requirements and sys paths (sneaky).
# View Scheduler Logs
docker logs airflow-dags-airflow-scheduler-1
# Bash into Airflow scheduler
docker exec -it airflow-dags-airflow-webserver-1 /bin/bash
# Backfill
You can now play around with Airflow features in a local environment.
- Build DAGs
- Install Plugins
- Setup Connections
- Monitor Jobs
This setup is using the Airflow Helm distribution for Kubernetes.
# Create airflow namespace
kubectl create namespace airflow
# Adding repo
helm repo add apache-airflow https://airflow.apache.org
# Install helm chart.
kubectl create namespace airflow
helm install airflow apache-airflow/airflow \
--namespace airflow \
--set webserver.livenessProbe.initialDelaySeconds=30
If you would like to monitor the distribution of the pods. Feel free to check out this watch and kubectl for active in-terminal monitoring.
watch -n 30 kubectl get namespace,deployment,svc,po -A