Data Engineering Zoomcamp 2022
- Start: 17 January 2022
- Main Course GitHub Repo
- DataTalks.Club's Slack. Join the
#course-data-engineering
channel - Videos are published to DataTalks.Club's YouTube channel in the course playlist
My Course Notes
Week 1: Introduction & Prerequisites
Week 2: Data Ingestion
- Data Lakes
- Data Pipeline Orchestration with Airflow
Week 3: Data Warehouse and Big Query
- Basics of Data Warehousing and Big Query
- Ingesting data into BQ Data Warehouse with Airflow
- Optimizing performance and cost with partitioning and clustering in BQ
- Machine Learning in BQ
Week 4: Analytics Engineering and dbt
- ETL vs ELT
- dbt basics
- transformations in the data warehouse and dbt Cloud
- dbt Project Repo
- Dashboards in Google Data Studio
Week 5: Batch Processing
- Batch vs Streaming
- Installing Spark
- Spark SQL and DataFrames
- Spark Internals