Leveraged a subset of the Yelp dataset to devise Deep Learning Algorithms for Sentiment Classification, Link Prediction and to build a recommender system.
This project aims to solve the multiclass sentiment classification problem on a subset of the Yelp dataset. The data consists of reviews as well as attributes such as ‘funny’, ‘cool’ and useful and the stars for each review that range from 1-5 which serve as the labels. Two different approaches for models are compared - 1) Heavyweight feature engineering-based ensemble model and 2) Contextualized word representation (BERT) based model.
- Ensemble | model | motivation
- BERT | model | motivation
This project aims to solve the link prediction problem on a subset of the Yelp dataset. The data consists of user_id and friends which correspond to a directed graph, whose edges serve as the labels. Two different random walk based embedding algorithms are compared - 1) DeepWalk and 2) node2vec.
- DeepWalk | model | motivation
- Node2Vec | model | motivation
This project aims to solve the rating prediction problem on a subset of the Yelp dataset. The data consists of users and businesses, with a rating that corresponds to the rating a user has given the respective business. In addition, various individual user attributes as well as business attributes are available to supplement these ratings. The Wide and Deep Model (WDL) has been used since it performed better than the Neural Collaborative Filtering Model (NCF) during our analysis, with a validation RMSE (Root Mean Squared Error) of 0.9996 from the former compared to RMSE of 1.0533 from the latter model.
- NCF | model | motivation
- WDL | model | motivation
All notebooks can be downloaded and run on Google Colab
Grateful to the Big Data Mining and Management faculty