Resources and projects from Udacity Data Engineering with AWS nano degree programme
Data modeling with Apache Cassandra
In this project,
- Apply concepts learned on data modeling with Apache Cassandra and complete an ETL pipeline using Python.
- Model the data by creating tables in Apache Cassandra to run queries.
Data warehousing with AWS Redshift
In this project,
- Apply concepts on data warehouses and AWS to build an ETL pipeline for a database hosted on Redshift.
- To complete the project, need to load data from S3 to staging tables on Redshift and execute SQL statements that create the analytics tables from these staging tables.
In this project,
- Use Spark and AWS Glue allow you to process data from multiple sources, categorize the data, and curate it to be queried in the future for multiple purposes.
- Build a data lakehouse solution for sensor data that trains a machine learning model.
In this project,
- Using Airflow to create high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills.
- Create custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step.