/airflow-technical-essentials

Code for the Apache Airflow Technical Essentials live training on O'Reilly

Primary LanguagePython

Airflow Technical Essentials

This repository contains code and examples used alongside the Apache Airflow Technical Essentials live training on O'Reilly.

Prerequisites

Standalone Installation (Single-node setup)

mkdir airflow-training && cd airflow-training
python3 -m venv airflow-venv
source ./airflow-venv/bin/activate
export airflow_python_version=$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)
export constraint_url="https://raw.githubusercontent.com/apache/airflow/constraints-2.8.0/constraints-${airflow_python_version}.txt"
pip install apache-airflow==2.8.0 --constraint "${constraint_url}"
export AIRFLOW_HOME=$(pwd)
export AIRFLOW__CORE__LOAD_EXAMPLES=False
mkdir dags
cp ../dags/example_dag.py dags/

airflow standalone

docker-compose Installation (Multi-node setup)

echo -e "AIRFLOW_UID=$(id -u)" >> .env

# Initialize the database
docker-compose up airflow-init

# Run airflow
docker-compose up

# user: airflow
# password: airflow

# cleanup
docker-compose down --volumes --remove-orphans

dags/example_dag.py

A basic DAG that introduces the concepts of tasks, operators, dependencies, and the DAG.

dags/weather_checker.py

A more robust DAG that uses variable, connection, xcom, SimpleHTTPOperator, and task context.

dags/query_dag.py

A dag that queries the Airflow metastore using the PostgresOperator.

Accessing the metastore container

docker exec -it $POSTGRES_CONTAINER psql -h postgres -U airflow 
# password is "airflow"

Notes

  • The logical date, formerly known as execution date, is the start of the data interval, not when the DAG is actually executed.
  • Environment variables override airflow.cfg (See .env for examples)