BenitaDiop/FullStackBigData-with-SPARK

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Jupyter Notebook

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

BenitaDiop/FullStackBigData-with-SPARK

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

AWS Cluster Configuration

AWS Notebook Configuration