/cratedb-airflow-tutorial

Reference implementations for orchestration project using Astronomer/Airflow

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

crate-airflow-tutorial

Orchestration Project - Astronomer/Airflow tutorials

This repository contains examples of Apache Airflow DAGs for automating recurrent queries. All DAGs run on Astronomer infrastructure installed on Ubuntu 20.04.3 LTS.

Installation

Before running examples make sure to set up the right environment:

Astronomer

Astronomer is the managed provider that allows users to easily run and monitor Apache Airflow environments. The best way to initialize and run projects on Astronomer is to use Astronomer CLI. To install its latest version on Ubuntu run:

curl -sSL https://install.astronomer.io | sudo bash

To make sure that Astronomer CLI is installed run:

astro version

For installation of Astronomer CLI on another operating system, please refer to the official documentation.

Project files

The project directory has the following file structure:

  ├── dags # directory containing all DAGs
  ├── include # additional files which are used in DAGs
  ├── .astro # project settings
  ├── Dockerfile # runtime overrides for Astronomer Docker image
  ├── packages.txt # specification of OS-level packages
  ├── plugins # custom or community Airflow plugins
  ├── setup # additional setup-related scripts/database schemas
  └── requirements.txt # specification of Python packages

In the dags directory you can find the specification of all DAGs for our examples. Each DAG is accompanied by a tutorial:

Start the project

To start the project on your local machine run:

astro dev start

To access the Apache Airflow UI go to http://localhost:8081.

From Airflow UI you can further manage running DAGs, check their status, the time of the next and last run and some metadata.

Docker BuildKit issue

If your Docker environment has the BuildKit feature enabled, you may run into an error when starting the Astronomer project:

$ astro dev start
Env file ".env" found. Loading...
buildkit not supported by daemon
Error: command 'docker build -t astronomer-project_dccf4f/airflow:latest failed: failed to execute cmd: exit status 1

To overcome this issue, start Astronomer without the BuildKit feature: DOCKER_BUILDKIT=0 astro dev start (see the Astronomer Forum).

Code linting

Before opening a pull request, please run pylint and black. To install all dependencies, run:

python -m pip install --upgrade -e ".[develop]"
python -m pip install --upgrade -r requirements.txt

Then run pylint and black using:

python -m pylint dags
python -m black .

Testing

Pytest is used for automated testing of DAGs. To set up test infrastructure locally, run:

python -m pip install --upgrade -e ".[testing]"

Tests can be run via:

python -m pytest -vvv