/Airflow_Retail_Pipeline

This project is inspired in the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!

Primary LanguagePythonMIT LicenseMIT

Airflow_Retail_Pipeline

This project is inspired by the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, Dbt, Soda, and more!

Prerequisites

  • Have Docker installed

    To install check: Docker Dekstop Install

  • Have Astro CLI installed

    If you use brew, you can run: brew install astro

    For other systems, please refer to: Install Astro CLI

  • Have a Soda account

    You can get a 45-day free trial: Soda

  • Have a Google Cloud account

    You can create your account here: Google Cloud

Getting Started

  1. Run astro dev init to create the necessary files for your environment.

  2. Run astro dev start to start the airflow service with docker.

  3. Download dataset from Kaggle - Online Retail

    • Create a folder dataset inside the include directory and add your CSV file there.
  4. Create a Google Cloud Bucket.

    • Create a folder called input
  5. Create a Service Account.

    • Grant access to Cloud Storage as "Storage Admin".

    • Grant access to BigQuery as "BigQuery Admin".

  6. Create a JSON key for the Service Account.

    • Create a folder gcp inside the include directory and add your JSON key there.
  7. Create a connection in the Airflow UI using the path of the JSON key.