This project aims at implementing different models for recommender system, regardless of running speed and performance. Still under construction.
See examples in test_reclib.py
Tips:
- Memory based models and matrix factorization models evaluate results on trainY while classification models evaluate on testY. Because the latter contain information of trainY when do modeling, the former don't.
- Usually, sequence models split dataset into train/test/predict by samples while others by time.
- Memory based Models
- User/Item CF models (with different similarity mearsurements)
- Content based models
- Matrix Factorization models
- Vanilla SVD
- SVD++
- Classfication Models
- Naive Bayes
- Logistic Regression
- GBDT
- Sequence Models
- N-gram
- LSTM
- Time-aware LSTM
- Ensemble Models
- Bagging
- AdaBoost
- Stacking
Split dataset by time. Have train
and predict
mode.
--------data set--------
|-- trainX --|-- trainY --|
|--- predictX ----|
-
CF (Collaborative Filtering)
- Train: Caculate similarities on trainX, tune parameter K (K is number of neighbours), do recommendation, evaluate on trainY, find best K
- Predict: Re-caculate similarities on predictX, fix best K, do recommendation
-
CB (Content Based)
- Train: Tune parameter D (D is dimension of features), extract features for users and items from trainX, calculate similarities, do recommendation, evaluate on trainY
- Predict: Re-extract features for users and items from predictX, calculate similarities, do recommendation
Split dataset by time. Have train
and predict
mode
--------data set--------
|-- trainX --|-- trainY --|
|--- predictX ----|
-
Vanilla SVD
-
SVD++
- Train: Tune parameter D (D is dimension of latent features), learn latent features for users and items on trainX, evaluate on trainY, find best D
- Predict: Fix best D, learn latent features for users and items on predictX, do recommendation
Split dataset by time. Have train
, test
and predict
mode.
--------data set----------------
|-- trainX --|-- trainY --|
|-- testX --|-- testY --|
|-- predictX-|
-
NB (Naive Bayes)
- Train: Extract features from trainX, caculate possibilities p(y|x) on trainX and trainY
- Test: Do classification and evaluate on testX and testY
- Predict: Re-caculate possibilities on predictX, do classification
-
LR (Logistic Regression)
- Train: Extract features from trainX, tune parameters e (e is parameter of Sigmoid function) and C (C is coefficient of regularization term), learn model weights on trainY
- Test: Do classification and evaluate on testX and testY
- Predict: Re-extract features on predictX, fix best e and C, do classification
-
XgBoost
Split dataset by samples. Have train
, test
and predict
mode.
--------data set----------------
|-- train(seqX,seqY) --|
|-- test(seqX,seqY) --|
|-- predict(seqX)-|
-
N-gram
-
LSTM
-
Time-aware LSTM