/FullStackBigData-with-SPARK

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Primary LanguageJupyter Notebook

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark


AWS Cluster Configuration

cluster

AWS Notebook Configuration

notebook