/Restaurant-Recommender-System-AWS-Hadoop-MapReduce

Academic project for Advance Database Management Systems (Big-Data) course

Primary LanguageJava

Restaurant Recommendation Enginer - Content Based and Personalized (Yelp Dataset)

Description

  • Developed a content based recommender that recommends restaurants to the users.
  • Extracted, pre-processed, and cleaned the data related to restaurants from Yelp academic dataset.
  • Implemented mapreduce design patterns like filtering, summarization, data organization, and join patterns to perform analysis such as top restaurants by country and state, total restaurants by country and state, moving average rating of restaurants, top restaurants by positive reviews, minimum and maximum review count of each restaurants, etc.
  • Performed sentiment analysis of the reviews about the restaurants given by Yelp users.
  • Calculated the pearson correlation, jaccard correlation and cosine correlation between restaurants to recommend to users.
  • Performed bining to split the data source on the basis of a preset value of a column and bloom filtering to filter the restaurants on basis of cities they are located in.
  • Deployed the project on AWS EC2 with 4 instances comprising of a namenode, a secondary namenode and two data nodes to achieve high scalability and performance.
  • Visualized the analysis in PowerBI.

Code

  1. Average rating and total restaurants by cuisine
  2. Content based recommendation
  3. Elite users based on useful votes
  4. Minimun maximum total review count
  5. Restaurants by star
  6. Restaurant search using bloom filtering
  7. Sentiment analysis of user reviews
  8. Sentiment analysis of user reviews by restaurants
  9. Simple moving average rating of restaurants
  10. Tip at restaurants
  11. Top 10 restaurants by positive reviews
  12. Top restaurants by state
  13. Total and average rating of restaurants by country
  14. Total restaurants by state

Data Preprocessing and Cleansing

Data Visualization

  • Total Restaurants by Cuisine

  • Top 10 Restaurants

  • Average Rating of Restaurants by Cuisine

  • Positive and Negative Review Count of Restaurants

  • Moving Average of a Restaurant

Programming Language

Java, R

Technologies

Hadoop, HDFS, MapReduce, AWS EC2, Ubuntu

Tools/IDE

Eclipse, RStudio, WinSCP, Putty, PuttyGen, PowerBI