/Apache-Spark-Projects

This repo contains all the projects I did using Apache Spark.

Primary LanguageJupyter Notebook

Apache-Spark-Projects

This repo contains all the projects I did using Apache Spark.

The list of Projects are as follows:

1. Ratings Counter

In this project we are going to use Spark to find out the count of each type of ratings. The ratings dataset contains ratings of 100k movies and is taken from the popular movielens dataset.

  • Concept Used :- RDD(Resilient Distributed Dataset)

2. Book Analysis

In this project we are going to use Spark to find out different propeties of two books. The book and book1 dataset are two random datasets taken to analyse them and generate a report and some stats like "Number of words present in the book", "Common words in both the books", "Most frequent words", etc... The dataset/book named as "book" and "book1" are present in the dataset folder present in this repo.

  • Concept Used :- RDD(Resilient Distributed Dataset)

3. Airbnb Average Price Analyzer

In this project we are going to use Spark to find out different insights about the Airbnb Dataset. The taken dataset is the 2019 Airbnb NYC dataset. I used them to analyse and generate a report and some stats like "Price per region,", "The maximum and minimum amount of revenue per region", etc... The dataset named as "AB_NYC_2019.csv" is present in the "Airbnb Average Price Analyzer" folder along with the .ipynb file in this repo.

  • Concept Used :- SparkSQL, Spark Dataframe