This project is inspired by the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, Dbt, Soda, and more!
-
Have Docker installed
To install check: Docker Dekstop Install
-
Have Astro CLI installed
If you use brew, you can run:
brew install astro
For other systems, please refer to: Install Astro CLI
-
Have a Soda account
You can get a 45-day free trial: Soda
-
Have a Google Cloud account
You can create your account here: Google Cloud
-
Run
astro dev init
to create the necessary files for your environment. -
Run
astro dev start
to start the airflow service with docker. -
Download dataset from Kaggle - Online Retail
- Create a folder
dataset
inside theinclude
directory and add your CSV file there.
- Create a folder
-
Create a Google Cloud Bucket.
- Create a folder called
input
- Create a folder called
-
Create a Service Account.
-
Grant access to Cloud Storage as "Storage Admin".
-
Grant access to BigQuery as "BigQuery Admin".
-
-
Create a JSON key for the Service Account.
- Create a folder
gcp
inside theinclude
directory and add your JSON key there.
- Create a folder
-
Create a connection in the Airflow UI using the path of the JSON key.