This project is also for a class project for Stanford's CS224W: Machine Learning with Graphs class. Specifically, the simple_make_dataset is added to generate data from the OAG graph, and random_walks is modified to fit OAG data format. Also added a file to calculate results of the random walk.
This project explores various methods for bipartite link prediction using Yelp's Dataset Challenge dataset. In particular, we try to predict which restaurants a particular user will review. This was done as a class project for Stanford's Social and Information Network Analysis class (CS224W). The final report is included in the repository (writeups/final_report.pdf). It requires scikit-learn, networkx, and snap.py to run.
- Place Yelp academic datasets in data/provided
- Run dataset_maker.py to generate examples
- Run any of the following files:
- dataset_metrics.py (prints various properties of the dataset)
- random_baseline.py (random predictions)
- random_walks.py (make predictions using unsupervised random walks)
- similarity.py (make predictions using heuristic similarity measures)
- supervised_classifier.py (make predictions using a supervised binary classifier)
- supervised_random_walks.py (make predictions using supervised random walks, see Backstrom and Leskovec, 2011)
- svd.py (make predictions using matrix factorization)
- Use eval.py to generate model evaluation metrics.