Data Engineering Zoomcamp 2022

My Course Notes

Week 1: Introduction & Prerequisites

Week 2: Data Ingestion

  • Data Lakes
  • Data Pipeline Orchestration with Airflow

Week 3: Data Warehouse and Big Query

  • Basics of Data Warehousing and Big Query
  • Ingesting data into BQ Data Warehouse with Airflow
  • Optimizing performance and cost with partitioning and clustering in BQ
  • Machine Learning in BQ

Week 4: Analytics Engineering and dbt

  • ETL vs ELT
  • dbt basics
  • transformations in the data warehouse and dbt Cloud
  • dbt Project Repo
  • Dashboards in Google Data Studio

Week 5: Batch Processing

  • Batch vs Streaming
  • Installing Spark
  • Spark SQL and DataFrames
  • Spark Internals