A series of projects covering different aspects of AWS
Project 1: EC2 Lambda
Create a simple lambda function and relevant roles to create a EC2 instance using Python Boto2. Test instance by SSH
Project 1: Data Transform
Project 1: Data modeling with PostgreSQL
Model user activity data for a music streaming app called Sparkify. Create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL define Fact and Dimension tables and insert data into the new tables.
Project 2: Data modeling with Apache Cassandra
Model user activity data for a music streaming app called Sparkify. Create a noSQL database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. Model your data in Apache Cassandra to allow for specific queries provided by the analytics team at Sparkify.
Concepts: Data Modelling, Relational Data Models, NoSQL Data models
Project 3: Build a Cloud Data Warehouse
Building on previous projects, build an ETL pipeline that extracts song data data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for their analytics team to continue finding insights in what songs their users are listening to.
Concepts: Data Warehouses, Cloud Computing with AWS, Implementing Data Warehouses on AWS
Project 4: Build a Data Lake
Build an ETL pipeline for a data lake. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app. Load data from S3, process the data into analytics tables using Spark, and load them back into S3. Deploy this Spark process on a cluster using AWS.
Concepts: Data Lakes, Cloud Computing with AWS, Spark