/recommendation_yelp

Using two models (Collective Matrix Factorization and Factorization Machine) to build a recommendation & personalization system for the yelp ratings dataset.

Primary LanguagePython

project-2-final-jlyc-fp

project-2-final-jlyc-fp created by GitHub Classroom

This is a recommendation project for yelp dataset in Yelp Dataset Chanllenge. The dataset can be downloaded in https://www.yelp.com/dataset/challenge

Team members: Xiao Ji (xj2247@columbia.edu, xj2247), Xinyi Liu (xl2904@columbia.edu, xl2904), Jiaying Chen (jc5299@columbia.edu, jc5299), Duanyue Yun (dy2400@columbia.edu, dy2400)

Please find the Final Report.ipynb for our final report.

Repository contents:

  • Codes folder
    • Bias baseline & user, item segmentation.ipynb: bias baseline model, user and item segmentation, build user attributes
    • CMF.ipynb: code for collective matrix factorization
    • Business.ipynb: exploratory analysis on business.json
    • MF baseline - ALS.ipynb: Matrix Factorization (using Spark ALS) baseline model
    • lightFM - Feature selection.ipynb: Factorization machine model feature selection
    • lightFM - cross validation.ipynb: Factorization machine model hyper-parameter cross validation
    • lightFM full dataset - overall results.ipynb: Factorization machine model full dataset precision and AUC
    • lightFM full dataset - precision by segment.ipynb: Factorization machine model full dataset precision by active/moderate/non-active users and popular/moderate/unpopular items
    • lightFM full dataset - auc by segment.ipynb: Factorization machine model full dataset AUC by active/moderate/non-active users and popular/moderate/unpopular items
  • Images folder
    • Images included in final report