Last edit : 29 July 2019
Recommender system using LightFM (without knowledge graph), trained using custom MovieLens-20M dataset.
Given a user and a list of items, find the top k items that user might like.
/data
: contains dataset to use in training/intersect-20m
: custom ml-20m dataset where only movies shows up in Ripple-Net's knowledge graph used./ml-1m
: original movielens 1m dataset
/log
: contains training result stored in single folder named after training timestamp./test
: contains jupyter notebook used in testing the trained models
logger.py
: log training result and save modelsmain.py
: script to run training
LightFM.ipynb
: jupyter notebook to make LightFM model (no longer used)
*italic* means this folder is ommited from git, but necessary if you need to run experiments
**bold** means this folder has it's own README, check it for detailed information :)
pip3 install -r requirements.txt
You can download the intersect-20m dataset here.
Note : Dataset is put on separate repository because it's shared among models.
data/intersect-20m/ratings_re2.csv
: After download the dataset, unzip theratings_re2.zip
and put inside the same folder as other things downloaded.data/intersect-20m/ratings.csr
: csr matrix (row = user, col = items, value = ratings) created with intersect-20m ratings, this file can be created using this jupyter notebook (Preprocess.ipynb inside the dataset).
Simply provide data/intersect-20m/ratings.csr
and run main.py
, the script will preprocess it before training begin.
- Prepare the dataset (check section below this)
- Run the training script
python3 main.py
There are several ways to do this :
- Open
main.py
and change the args parser default value - run
main.py
with arguments required.
- Find the training result folder inside
/log
(find the latest), copy the folder name. - Create copy of latest jupyter notebook inside
/test
folder. - Rename folder to match a folder in
/log
(for traceability purpose). - Replace
TESTING_CODE
at the top of the notebook. - Run the notebook
Evaluated on | Prec@10 |
---|---|
500 user | 0.09100 |
1000 user | 0.09330 |
5000 user | 0.09902 |
13850 user | 0.09864 |
25000 user | 0.09764 |
Evaluated on | Distinct@10 | Unique items |
---|---|---|
10 user | 0.23000 | 23 |
30 user | 0.04333 | 13 |
100 user | 0.01300 | 13 |
1000 user | 0.00190 | 19 |
3000 user | 0.00043 | 13 |
- Models tends to suggest generic items that are rated high by a large number of users.
- LightFM model is much slower to train & test (compared to Autorec) since it doesn't support GPU
- It's user centric, when predicting, the model require input (users, items) and will output score for each user-items pairing.
- Require re-train if model want to predict for new user
- Model can be improved using user's features and item's features (also handle cold-start problem)
- Model can handle cold-start problem; improved using user's features and item's features (eg: user genre preferences / other features).
- The model uses suitable loss and metric (Prec@k)
- Models tends to suggest generic items that are rated high by a large number of users.
- LightFM model is much slower to train & test (compared to Autorec) since it doesn't support GPU
- It's user centric, when predicting, the model require input (users, items) and will output score for each user-items pairing
- Require re-train if model want to predict for new user
- The model doesn't learn when dimension set to 64
- Jessin Donnyson - jessinra@gmail.com
- Benedict Tobias Henokh Wahyudi - tobi8800@gmail.com
- Michael Julio - michael.julio@gdplabs.id
- Fallon Candra - fallon.candra@gdplabs.id