We have devised a recommendation system for restaurants based on the Yelp Dataset Challenge.
Our aim is to predict if a certain user will like a certain restaurant, depending on characteristics of the restaurant, user’s taste (derived from his previous reviews), opinion of similar users (who gave similar votes to similar restaurants), trustworthiness of the reviews.
In order to do so, we studied three papers, that we used as the basis of our work:
- Restaurant Recommendation System, Ashish Gandhe;
- Machine Learning and Visualization with Yelp Dataset, Zhiwei Zhang (with her repo);
- Recommendation for yelp users itself, Wenqi Hou, Gauravi Saha, Manying Tsang.
We started from the data cleaning performed by Hou, Saha and Tsang (3), then we applied the SVM model proposed by Zhang (2) to identify fake reviews, and assigned to each review a “truth score” (an indicator of the trustworthiness of the review), so that we could use this score as a weight in the computation of the historical features described by Gandhe (1).
After some further preprocessing we applied three machine learning models to the obtained dataset: a Support Vector Machine, a Random Forest and a Neural Network, from which we got our predictions.