The DE Zoomcamp cohort 2024 is organized by DataTalksClub.
-
Module 1: Dockers and Terraform 📦
-
Run postgres and pgadmin in containers. Best practice for reproducibility
-
Infrastructure as Code (Terraform).
-
Service account in GCP: credentials as .json file in project directory.
-
-
Module 2: Data Orchestration - Mage 🔧
- Mage as the main tools for data orchestration
- Directly git clone from the Mage repository for DE Zoomcamp 2024
- Run containers to set up the mage application
docker compose build docker compose up
- Checkout the note in medium article by me
- Using Mage as the Workflow Orchestration Tools 🚀
-
Module 3: Data Warehouse & Big Query 🏭
- Data Warehouse (Big Query) for OLAP
- External Table vs Materialized Table
- Optimization of Query to increase performance and save cost via partitioning and clustring.
- Checkout the note in medium article by me
- Data Warehouse & BigQuery 🚀
-
Workshop 1: Data Load Tool (dlt)
- An open-source library that ease the data loading steps for data engineer.
- Use generator as the main concept for memory management.
- Checkout the note in medium article by me
- Data Ingestion with Data Loads Tool (dlt): Be the Magician in Data Engineering 🚀
-
Module 4: Analytics Engineering with dbt 📊
-
Check out the medium article by me at below
-
From Testing/Documenting of dbt model to deployment in dbt cloud 🚀
-
-
-
Check out the medium article by me below
-
How to Run Spark on Ubuntu Machine in Google Cloud (PySpark: Basic) 🚀
-
Understand the Spark Cluster: Spark DataFrame and Spark SQL with PySpark 🚀
-
-
-
Check out the medium article by me below
-
How Apache Kafka works internally along with its configuration ? 👨🏻💻
-
-
Workshop 2: Stream Processing with RisingWave 🌊
- A SQL streaming database
- Check out medium article by me below
- Restructure the Stream Processing With RisingWave 🌊