/dataops-platform-airflow-dbt

Build DataOps platform with Apache Airflow and dbt on AWS

Primary LanguagePythonMIT No AttributionMIT-0

DataOps Platform with Apache Airflow and dbt on AWS

This repository contains code to deploy the architecture described in the blog post: "Build DataOps platform to break silos between engineers and analysts".


Architecture overview

Architecture

The architecture includes following AWS services:

  • Amazon Elastic Container Service, to run Apache Airflow and dbt
  • Amazon Elastic Container Repository, to store Docker images for Airflow and dbt
  • Amazon Redshift, as data warehouse
  • Amazon Relational Database System, as metadata store for Airflow
  • Amazon ElastiCache for Redis, as a Celery backend for Airflow
  • Amazon Simple Storage Service, to store Airflow and dbt DAGs
  • AWS CodeBuild (optional), automate deployments

Repository structure

In this repository there are two main project folders: dataops-infra and analytics. This setup is meant to demonstrate how DataOps can foster effective collaboration between data engineers and data analysts, separating the platform infrastructure code from the business logic.

These two folders should be considered as two separate repositories following their own release cycles.

DataOps Platform Infrastructure

The dataops-infra folder contains code and intructions to deploy the platform infrastructure described in the Architecture overview section. This project is created from the prospective of a data engineering team that is responsible for creating and maintaining data infrastructure such as data lake, data warehouse, orchestration, and CI/CD pipelines for analytics.

Analytics

The analytics folder contains code and instructions to manage and deploy Airflow and dbt DAGs on the DataOps platform. This project is created from the prospective of a data analytics team composed of data analysts and data scientists. They have domain knowledge and are responsible for serving analytics requests from different stakeholders such as marketing and business development teams so that company can make data driven decisions.

License

This library is licensed under the MIT-0 License. See the LICENSE file.