Last edit : 29 July 2019
Recommender system using RippleNet, trained using custom MovieLens-20M dataset.
This repository is the implementation of RippleNet (arXiv):
RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems
Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018)
RippleNet is a deep end-to-end model that naturally incorporates the knowledge graph into recommender systems. Ripple Network overcomes the limitations of existing embedding-based and path-based KG-aware recommendation methods by introducing preference propagation, which automatically propagates users' potential preferences and explores their hierarchical interests in the KG.
A PyTorch re-implementation of RippleNet by Qibin Chen et al. is here.
Given a user and an item, predict how likely the user will interact with the item
/data
: contains dataset to use in training/book
: (not used)/movie
: custom ml-20m dataset where only movies shows up in Ripple-Net's knowledge graph used.
/log
: contains training result stored in single folder named after training timestamp./test
: contains jupyter notebook used in testing the trained models/src
: implementations of RippleNet.
*italic* means this folder is ommited from git, but necessary if you need to run experiments
**bold** means this folder has it's own README, check it for detailed information :)
pip3 install -r requirements.txt
You can download the intersect-20m dataset here. Copy all items inside and put into /movie
folder except the Preprocess.ipynb.
Note : Dataset is put on separate repository because it's shared among models.
- To create missing
kg_final.txt
&ratings_final.txt
files:
- either use this jupyter notebook (
Preprocess.ipynb
inside the./data/movie
). - or simply run :
python3 src/preprocess.py
-
To create the rest of the files :
Simply provide
kg_final.txt
v andratings_final.txt
then runpython3 src/main.py
the script will preprocess it (and save the result for cache) before training begin.
- Prepare the dataset and the preprocessed version (check section below this)
- Run the training script
python3 src/main.py
To start using new dataset, or if dataset changed, those file need to be deleted, and re preprocess it, otherwise it will use old dataset:
data/movie/ratings_final.npy
data/movie/kg_final.npy
data/movie/preprocessed_data_info_32
or_64
There are several ways to do this :
- Open
src/main.py
and change the args parser default value - run
src/main.py
with arguments required.
- Find the training result folder inside
/log
(find the latest), copy the folder name. - Create copy of latest jupyter notebook inside
/test
folder. - Rename folder to match a folder in
/log
(for traceability purpose). - Replace
TESTING_CODE
at the top of the notebook. - Run the notebook
Evaluation size | Prec@10 | Distinct@10 | Unique items |
---|---|---|---|
Eval on 500 user | 0.26719 | 0.04640 | 232 |
Eval on 1000 user | 0.26939 | 0.02860 | 286 |
Eval on 3000 user | 0.26379 | 0.01317 | 395 |
Eval on 5000 user | 0.26573 | 0.00914 | 457 |
-
Looking from the result of Ripplenet-1M, the usage of KG might turn out to be quite promising, especially to improve the diversity of suggestion.
-
The downside of using RippleNet is that the train and test both takes a lot of time (almost 1 hr/epoch for training, and even longer for testing). This probably caused by the nature of RippleNet which supposed to predict
Given a user and an item, how likely the user will interact with the item
and not
Given a user and a list of items, find the top k items that user might like
-
Summary compared to non-KG RecSys: Big improvement in terms of Prec@k and diversity
- Able to incorporate Knowledge Graph as another source of information with relatively simple approach (embedding)
- Higher Prec@K compared to KPRN and its derivative models.
- Require much less memory and disk space compared to KORN.
- Doesn't require too much hand crafted features.
- During training, the model converge really fast (< 10 epochs)
- Require relatively slow pre-preprocessing
- The train and test both takesMore diverse suggestion. a lot of time (almost 1 hr/epoch for training, and even longer for testing).
- The model remember the user, the model need to be re-trained for every new user and item addition.
- Loss function and metric used in training is not Prec@K, instead it uses AUC and accuracy.
- Output less diverse compared to KPRN
- Looking from the result of Ripplenet-1M, the usage of KG might turn out to be quite promising, especially to improve the diversity of suggestion.
- Use lower dimension of embedding (n = 16) result in much better performance and faster training result.
- Jessin Donnyson - jessinra@gmail.com
- Benedict Tobias Henokh Wahyudi - tobi8800@gmail.com
- Michael Julio - michael.julio@gdplabs.id
- Fallon Candra - fallon.candra@gdplabs.id