The Airflow POC project aims to demonstrate the capabilities and benefits of using Apache Airflow, an open-source platform to programmatically author, schedule, and monitor workflows. This project serves as a starting point for exploring Airflow's features and understanding how it can be integrated into your data pipeline.
- Use this command to download the file:
Curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
- Run these commands to create some directories and a file:
mkdir -p ./dags ./logs ./plugins ./config ./data
echo -e "AIRFLOW_UID=$(id -u)" > .env
- Put this content in your Dockerfile:
FROM apache/airflow:2.8.0
USER airflow
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --user -r /tmp/requirements.txt
- Use this command to build the Docker image:
docker build -t apache/airflow:2.8.0 .
- Add these lines to the file. They tell Docker where to find your data:
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/data:/opt/airflow/data
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
- Run this command to set up the database:
docker compose up airflow-init
- Finally, use this command to start Airflow:
docker compose up