Airflow + Spark Docker environment

Local containerized environment for development using Docker, Airflow and Spark.

Table of Contents

General Information

  • This project aims to provide a quick and easy way to provision a local development environment with Airflow and Spark for data enthusiasts.
  • Here we use as a base the docker-compose.yaml available on the Airflow website. We extend the image to install our requirements and modify the services in docker-compose.yaml according to our needs.

Useful Articles

Technologies Used

Screenshots

Containers

Dag run

Spark master

Setup

Usage

git clone git@github.com:razevedo1994/airflow_and_spark_docker_environment.git
mkdir -p ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)" > .env
chmod +x ./build_environment.sh ./reset_environment.sh
./build_environment.sh
  • Create this connection on Airflow: Spark master

  • To access Airflow UI:

http://localhost:8080/
  • To access Spark Master UI:
http://localhost:8181/

Attention: If you want to clean up your environment run ./reset_environment.sh . But be careful, this command will delete all your images and containers.

Project Status

Project is: in progress

Room for Improvement

--

Contact

Created by Rodrigo Azevedo - feel free to contact me!