Udacity Data Engineer Nanodegree Projects

Project 1: Data Modeling with Postgres

You will need to define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

Project 2: Data Modeling with Apache Cassandra

You will need to model fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers CSV files from two local directories into Apache Cassandra tables using Python and SQL.

Project 3: Data Warehouse in AWS

You will need to model fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers JSON files from two S3 into Amazon Redshift tables using Python, SQL and boto3.

Project 4: Data Lake in AWS

In this project, you need to build an ETL pipeline for a data lake hosted on S3. You will need to load data from S3, process the data into analytics tables using Spark, and load them back into S3. Deployment of this Spark process on a cluster using AWS EMR is part of the job.

Project 5: Data pipelines with Airflow

You will need to create your own custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. You going to work with Airflow, S3 and Redshift cluster.

Project 6: Capstone Project - NOAA Climate Data Dashboard

In this project I decided to build climate analysis dashboard based on this Kaggle dataset using Docker, Influx DB and Grafana.