This repo contains source code for our paper: "Regularizing Matrix Factorization with User and Item Embeddings for Recommendation" published in CIKM 2018. We implemented using multi-threads, so it is very fast to run with big datasets.
- First line: the header "userId,movieId"
- Second line --> last line: [userId],[movieId]
We preprocessed it and splitted into train, vad/dev, test. Their paths are:
-
data/ml10m/train.csv
-
data/ml10m/test.csv
-
data/ml10m/validation.csv
- First line: the header "userId,movieId"
- Second line --> last line: [userId],[movieId]
do 2 steps:
- saved it to data/ml10m/train_neg.csv
- build the disliked item-item co-occurrence by running (assume that the dataset is ml10m): python produce_negative_cooccurrence.py --dataset ml10m
python produce_positive_cooccurrence.py --dataset ml10m
Step 1.2: prduce negative co-occurrence matrix of item-item (if the dislike items are available, if not available, that's ok, we will infer disliked item in step 2):
python produce_negative_cooccurrence.py --dataset ml10m
python rme_rec.py --dataset ml10m --model rme --neg_item_inference 0 --n_factors 40 --reg 1.0 --reg_embed 1.0
Step 2.2: run RME with our user-oriented EM-like algorithm to infer disliked items for users (in case disliked items are not available, and you are not able to run step 1.2):
python rme_rec.py --dataset ml10m --model rme --neg_item_inference 1 --n_factors 40 --reg 1.0 --reg_embed 1.0
where:
model
: the model to run. There are 3 choices:rme
(our model),wmf
,cofactor
.reg
: is the regularization hyper-parameter for user and item latent factors (alpha and beta).reg_emb
: is the regularization hyper-parameter for user and item context latent factors (gamma, theta, delta).n_factors
: number of latent factors (or embedding size). Default: n_factors = 40.neg_item_inference
: whether or not running our user-oriented EM like algorithm for sampling disliked items for users. In case we have available user-disliked_items --> set this to 0.neg_sample_ratio
: negative sample ratio per user. If a user consumed 10 items, and this neg_sample_ratio = 0.2 --> randomly sample 2 negative items for the user. Default: 0.2.
s
: the shifted constant, which is a hyper-parameter to control density of SPPMI matrix. Default: s = 1.data_path
: path to the data. Default: data.saved_model_path
: path to saved the optimal model using validation/development dataset. Default: MODELS.
You may get some results like:
top-5 results: recall@5 = 0.1559, ndcg@5 = 0.1613, map@5 = 0.1076
top-10 results: recall@10 = 0.1513, ndcg@10 = 0.1547, map@10 = 0.0851
top-20 results: recall@20 = 0.1477, ndcg@20 = 0.1473, map@20 = 0.0669
top-50 results: recall@50 = 0.1819, ndcg@50 = 0.1553, map@50 = 0.0562
top-100 results: recall@100 = 0.2533, ndcg@100 = 0.1825, map@100 = 0.0579
- Running cofactor:
python rme_rec.py --dataset ml10m --model cofactor --n_factors 40 --reg 1.0 --reg_embed 1.0
You may get the results like:
top-5 results: recall@5 = 0.1522, ndcg@5 = 0.1537, map@5 = 0.1000
top-10 results: recall@10 = 0.1383, ndcg@10 = 0.1425, map@10 = 0.0756
top-20 results: recall@20 = 0.1438, ndcg@20 = 0.1391, map@20 = 0.0606
top-50 results: recall@50 = 0.1762, ndcg@50 = 0.1484, map@50 = 0.0518
top-100 results: recall@100 = 0.2545, ndcg@100 = 0.1783, map@100 = 0.0540
- Running WMF:
python rme_rec.py --dataset ml10m --model wmf --n_factors 40 --reg 1.0 --reg_embed 1.0
You may get the results like:
top-5 results: recall@5 = 0.1258, ndcg@5 = 0.1283, map@5 = 0.0810
top-10 results: recall@10 = 0.1209, ndcg@10 = 0.1231, map@10 = 0.0624
top-20 results: recall@20 = 0.1290, ndcg@20 = 0.1230, map@20 = 0.0507
top-50 results: recall@50 = 0.1641, ndcg@50 = 0.1349, map@50 = 0.0442
top-100 results: recall@100 = 0.2375, ndcg@100 = 0.1640, map@100 = 0.0470
If you use our method (RME) in your paper, please cite the paper:
@inproceedings{tran2018regularizing,
title={Regularizing Matrix Factorization with User and Item Embeddings for Recommendation},
author={Tran, Thanh and Lee, Kyumin and Liao, Yiming and Lee, Dongwon},
booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
pages={687--696},
year={2018},
organization={ACM}
}