/Data-Streaming-Project

used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline

Primary LanguagePython

Architecture

1698196520888

Description

I used Airflow, PostgreSQL, Kafka, Spark and Cassandra in order to establish a fully automated ETL pipeline in a container runtime, with a CI using GitHub Actions to automate the service's Docker image updates on DockerHub.

Get Started

  • Clone the repository
    • git clone https://github.com/moontucer/Data-Streaming-Project/
  • Go to the project folder
    • cd Data-Streaming-Project
  • Build the environment with Docker Compose
    • docker-compose up

Link to the Medium article

https://medium.com/@moontucer/data-streaming-project-real-time-end-to-end-data-pipeline-082f0d9cfbdb