/Build-a-Data-Lake-Using-AWS-S3-Spark-Cluster

In this project, We'll build an ETL pipeline for a data lake. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app. We will load data from S3, process the data into analytics tables using Spark, and load them back into S3. Then deploy this Spark process on a cluster using AWS.

Primary LanguageJupyter Notebook

Stargazers