/Data-Engineering-With-AWS

Resources and projects from Udacity Data Engineering with AWS nano degree programme

Primary LanguageJupyter Notebook

Data-Engineering-With-AWS

Resources and projects from Udacity Data Engineering with AWS nano degree programme

Projects

Data Modelling

Data modeling with Apache Cassandra

In this project,

  • Apply concepts learned on data modeling with Apache Cassandra and complete an ETL pipeline using Python.
  • Model the data by creating tables in Apache Cassandra to run queries.

Cloud Data Warehouses

Data warehousing with AWS Redshift

In this project,

  • Apply concepts on data warehouses and AWS to build an ETL pipeline for a database hosted on Redshift.
  • To complete the project, need to load data from S3 to staging tables on Redshift and execute SQL statements that create the analytics tables from these staging tables.

Spark and Data Lakes

STEDI Human Balance Analytics

In this project,

  • Use Spark and AWS Glue allow you to process data from multiple sources, categorize the data, and curate it to be queried in the future for multiple purposes.
  • Build a data lakehouse solution for sensor data that trains a machine learning model.

Automate Data Pipelines

Data Pipelines with Airflow

In this project,

  • Using Airflow to create high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills.
  • Create custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step.