/Complete-DE-Project

An end to end data engineering project using Google Cloud, Prefect and dbt

Primary LanguagePython

Data Engineering Zoomcamp 2023 Cohort

This repository contains my solutions for the exercises and projects in the Data Engineering Zoomcamp by DataTalksClub

Course Outline

The course covers the following topics and contains code and examples for these:

  • Google Cloud (Cloud Storage and BigQuery)
  • Terraform
  • Shell scripting
  • Docker and containerization
  • Python (Specifically, Prefect, a Python-based ETL/ELT tool)
  • SQL
  • dbt

In the course, you create cloud infrastructure using Terraform, cover the principles of Docker and containerization, create Python based ETL pipelines using Prefect, interact with the data using Google Cloud Storage and Google BigQuery, create models in dbt, and create reports from the data.

Prerequisites

I created a Dev Container that includes all required dependencies for the course. This includes:

  • Python 3.9
    • Pandas
    • SQLAlchemy
    • PySpark
    • PyArrow
    • Polars
    • Prefect and all required Python dependencies
    • confluent-kafka
    • scikit-learn
    • Snowpark
    • ipykernel
  • Google Cloud SDK
  • Azure CLI
  • GitHub CLI
  • GitLens
  • GitHub Pull Requests
  • dbt-core
    • dbt-postgres
    • dbt-bigquery
  • dbt extensions for VS Code
  • Snowflake for VS Code
  • MS SQL Server for VS Code
  • Terraform
  • Jupyter Notebooks for VS Code
  • Docker
  • Spark
  • JDK version 11
  • XML tools
  • YAML tools
  • Oh My Posh Powershell themes

Acknowledgments

The Data Engineering Zoomcamp was created by DataTalksClub and is an amazing resource for anyone looking to learn more about data engineering. As a practicing data engineer, I personally found the course to be great and learned a lot. Thank you to the instructors and contributors who created the course materials and provided guidance throughout the program.