/airflow-testing

Airflow Unit Tests and Integration Tests

Primary LanguagePython

Airflow Testing

This project contains different categories of tests with examples.

Five Categories of Tests

  1. DAG Validation Tests: To test the validity of the DAG, checking typos and cyclicity.
  2. DAG/Pipeline Definition Tests: To test the total number of tasks in the DAG, upstream and downstream dependencies of each task, etc.
  3. Unit Tests: To test the logic of custom Operators, custom Sensor, etc.
  4. Integration Tests: To test the communication between tasks. For example, task1 pass some information to task 2 using Xcoms.
  5. End to End Pipeline Tests: To test and verify the integration between each task. You can also assert the data on successful completion of the E2E pipeline.

Clone this repo to run these test in your local machine.

Unit Tests

Unit tests cover all tests falls under teh first four categories.

How to run?

  1. Build the airflow image. Go to project root directory and run

    docker build . -t airflow-test

  2. Run the unit tests from the docker. Use your repository location for {SourceDir} (Eg. If you cloned your repo at /User/username/airflow-testing/ then SourceDir is /User/username.)

    docker run -ti -v {SourceDir}/airflow-testing:/opt --entrypoint /mnt/entrypoint.sh airflow-test run_unit_tests

End-to-End Tests

End-to-End tests cover all tests of category five. To run these tests, we need to set up airflow environment in minikube. Also, we need to set up all the component required by your DAGs.

Minikube set up

Prerequisite:

git clone https://github.com/chandulal/airflow-testing.git
brew cask install virtualbox (run if you don't have virtual box installed)

Install minikube

brew cask install minikube
brew install kubernetes-cli
minikube start --cpus 4 --memory 8192

Mount DAGs, Plugins, etc.

Mount all your DAGs,Plugins, etc. in minikube

minikube mount {project dir}/src/main/python/:/data

Deploy Airflow in minikube

Open new terminal. Go to project root dir and run:

kubectl apply -f airflow.kube.yaml

wait for 3-4 min to start all airflow components.

This will set up following components:

  • Postgres (To store the metadata of airflow)
  • Redis (Broker for celery executors)
  • Airflow Scheduler
  • Celery Workers
  • Airflow Web Server
  • Flower

Access Airflow

Get minikube ip by running minikube ip command

Use minikube ip and access:

**Airflow UI:** {minikube-ip}:31317 

**Flower:** {minikube-ip}:32081

How Airflow works in minikube?

minkube_airflow_architecture

How to run these tests?

  1. Install all required components to run your DAGs in minikube. To run integration tests, available in this repo, we required MySQL and Presto on minikube.

     kubectl apply -f {SourceDir}/k8s/mysql/mysql.kube.yaml
     kubectl apply -f {SourceDir}/k8s/presto/presto.kube.yaml
     
  2. Run the integration tests from the docker. Use absolute path of this repository in your machine for {SourceDir}

    docker run -ti -v {SourceDir}/airflow-testing:/opt --entrypoint /mnt/entrypoint.sh airflow-test run_integration_tests {minikube-ip}