- Goocle Cloud Platform
- Terraform
- Docker
- SQL
- Airflow
- dbt
- Spark
- Kafka
- Setting up the Environment
- Google Cloud Account
- Docker
- Terraform
- Running Postgres in Docker
- Taking a look at the NY Taxi dataset
- SQL refresher
- Data Lake
- What is a Data Lake
- ETL vs ELT
- Using GCS
- Orchestration
- What is an Orchestration Pipeline
- Data Ingestion
- Introducing & Using Airflow
- Demo
- Setting up Airflow with Docker
- Data Ingestion DAG
- Extraction
- Pre-processing (parquet, partitioning)
- Loading
- Exploration with Big Query
- Best Practices
- What is Data Warehouse?
- BigQuery?
- Partitioning and Clustering
- With Airflow
- Best Practices
- What is dbt and how does it fit the tech stack?
- Using dbt:
- Anatomy of a dbt model
- Seeds
- Jinja, Macros and test
- Documentation
- Packages
- Build a dashboard in Google Data Studio
- Spark internals
- Broadcasting
- Partitioning
- Shuffling
- Spark + Airflow
- Apache Flink as alternative
- Basics of Kafka
- Consumer-Producer
- Kafka Streams
- Kafka Connect