This repository contains my solutions for the exercises and projects in the Data Engineering Zoomcamp by DataTalksClub
The course covers the following topics and contains code and examples for these:
- Google Cloud (Cloud Storage and BigQuery)
Terraform
- Shell scripting
Docker
and containerizationPython
(Specifically,Prefect
, a Python-based ETL/ELT tool)SQL
dbt
In the course, you create cloud infrastructure using Terraform, cover the principles of Docker and containerization, create Python based ETL pipelines using Prefect, interact with the data using Google Cloud Storage and Google BigQuery, create models in dbt, and create reports from the data.
I created a Dev Container that includes all required dependencies for the course. This includes:
Python 3.9
Pandas
SQLAlchemy
PySpark
PyArrow
Polars
Prefect
and all required Python dependenciesconfluent-kafka
scikit-learn
Snowpark
ipykernel
Google Cloud SDK
Azure
CLIGitHub
CLIGitLens
GitHub
Pull Requestsdbt-core
dbt-postgres
dbt-bigquery
dbt
extensions for VS CodeSnowflake
for VS CodeMS SQL Server
for VS CodeTerraform
Jupyter Notebooks
for VS CodeDocker
Spark
JDK
version 11XML
toolsYAML
toolsOh My Posh
Powershell themes
The Data Engineering Zoomcamp was created by DataTalksClub and is an amazing resource for anyone looking to learn more about data engineering. As a practicing data engineer, I personally found the course to be great and learned a lot. Thank you to the instructors and contributors who created the course materials and provided guidance throughout the program.