/airflow

Primary LanguageDockerfile

airflow

Introduction

This repository complete my blog about airflow , which is hosted at:

https://www.obytes.com/blog/getting-started-with-apache-airflow

Build and Run


docker-coompose build
docker-coompose up -d 


Usage

By default, docker-airflow runs Airflow with SequentialExecutor you can use CeleryExecutor by setting an env variable : AIRFLOW__CORE__EXECUTOR=CeleryExecutor

NB : If you want to have DAGs example loaded (default=False), you've to set the following environment variable :

AIRFLOW__CORE__LOAD_EXAMPLES=True

If you want to use Ad hoc query, make sure you've configured connections: Go to Admin -> Connections and Edit "postgres_default" set this values (equivalent to values in docker-compose*.yml) :

  • Host : airflow_db
  • Schema : airflow
  • Login : postgres
  • Password : postgres

For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose env file
AIRFLOW__CORE__FERNET_KEY=fernet_key

docker exec -ti container_id python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"

Configuring Airflow

It's possible to set any configuration value for Airflow from environment variables, which are used over values from the airflow.cfg.

The general rule is the environment variable should be named AIRFLOW__<section>__<key>, for example AIRFLOW__CORE__SQL_ALCHEMY_CONN sets the sql_alchemy_conn config option in the [core] section. AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://postgres:postgres@airflow_db:5432/airflow for a connection hook called "sql_alchemy_conn"

Check out the Airflow documentation for more details

UI Links

Running other airflow commands

If you want to run other airflow sub-commands, such as list_dags or clear you can do so like this:

docker run --rm -ti container_id airflow list_dags

or with your docker-compose set up like this:

docker-compose run --rm airflow_webserver airflow list_dags

You can also use this to run a bash shell or any other command in the same environment that airflow would be run in:

docker run --rm -ti container_id bash
docker run --rm -ti container_id ipython

Simplified SQL database configuration using PostgreSQL

If the executor type is set to anything else than SequentialExecutor you'll need an SQL database. Here is a list of PostgreSQL configuration variables and their default values. They're used to compute the AIRFLOW__CORE__SQL_ALCHEMY_CONN and AIRFLOW__CELERY__RESULT_BACKEND

You can also use those variables to adapt your compose file to match an existing PostgreSQL instance managed elsewhere.

Here's an important thing to consider:

When specifying the connection as URI (in AIRFLOW_CONN_* variable) you should specify it following the standard syntax of DB connections .