/Fake-Review-Detection-System

Our project mainly focused on reviews, users, and business datasets from Yelp open data source. For reviews, we kept the unique id of each review, user id for people wrote the review, business id for restaurants that the user wrote it for, the review content, the rating according to the review, and the date when the review was writing. We also added in one geolocation character into the review data, which indicates the restaurant’s location where the review implied. We kept user id, the number of reviews that a user has written, the time since an user joined Yelp, and the average ratings reviews that an user has written from the user.json file. Last by not the least, we only included business id, name, business categories, geolocation information (city, state, postal codes, latitude, longitude), price range, ratings, number of reviews, and whether the restaurant is open or not from the business dataset. The Model is trained will be trained providing the labelled fake and true reviews. Support Vector Machine Classifier Pre-process: We used feature extraction module called Tfidf Vectorizer which is a scheme that transformed each review to a large sparse matrix with each cell represents a word and the frequency it appears in that review.

Primary LanguageJupyter Notebook

This repository is not active