AWS

A series of projects covering different aspects of AWS

Serverless

Project 1: EC2 Lambda

Create a simple lambda function and relevant roles to create a EC2 instance using Python Boto2. Test instance by SSH

Data Engineering

Project 1: Data Transform

Data Engineering Udacity NanoDegree

Project 1: Data modeling with PostgreSQL

Model user activity data for a music streaming app called Sparkify. Create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL define Fact and Dimension tables and insert data into the new tables.

Project 2: Data modeling with Apache Cassandra

Model user activity data for a music streaming app called Sparkify. Create a noSQL database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. Model your data in Apache Cassandra to allow for specific queries provided by the analytics team at Sparkify.

Concepts: Data Modelling, Relational Data Models, NoSQL Data models

Project 3: Build a Cloud Data Warehouse

Building on previous projects, build an ETL pipeline that extracts song data data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for their analytics team to continue finding insights in what songs their users are listening to.

Concepts: Data Warehouses, Cloud Computing with AWS, Implementing Data Warehouses on AWS

Project 4: Build a Data Lake

Build an ETL pipeline for a data lake. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app. Load data from S3, process the data into analytics tables using Spark, and load them back into S3. Deploy this Spark process on a cluster using AWS.

Concepts: Data Lakes, Cloud Computing with AWS, Spark

edwards158/AWS

AWS