Top-N recommendation is widely accepted as an effective method in personalized service that well serves users of different interests. However, as observed from the data, the user activity level also plays an important role in the recommendation. Existing studies do not pay a high attention to this issue, which simply assume the preference of all users follows a common probability distribution and then use a fixed schema (e.g., one latent vector) to model user representation. This assumption makes existing models hard to accommodate users of different activity levels. In this work, we propose a Variational Kernel Density Estimation (VKDE) model, a non-parametric estimation, which aims to fit arbitrary preference distributions for users. VKDE divides user representation into multiple latent vectors, each of which corresponds to user one-faceted interest. Multiple local distributions are generated by variational kernel function, and then aggregated as the global preference distribution of the user. To reduce training complexity and keep the recommendation effectiveness, a sampling strategy is further proposed. Our experimental results on three public datasets show that VKDE outperforms SOTAs and greatly improves the accuracy for users of different activity levels.
pip install -r requirements.txt
We provide three processed datasets: Yelp2018, Amazon-book and Video Games (the other two datasets will be uploaded soon).
see more in dataloader.py
run VKDE on Yelp2018 dataset:
- change base directory
Change ROOT_PATH
in src/world.py
- command
` cd code/src && python main.py --dataset yelp2018 --topks=[20] --model VKDE --epoch 400 --tau_model2 0.1 --reg_model2 0.001 --dropout_model2 0.5 --lr 0.001 --cuda 0 --enc_dims [64]
- log output
...
======================
{'precision': array([0.03109132]), 'recall': array([0.06926436]), 'ndcg': array([0.05608385])}
EPOCH[11/400] Elapsed time: 85.2 Neg_ll: 148403.24
...
======================
{'precision': array([0.03548693]), 'recall': array([0.0784675]), 'ndcg': array([0.06469457])}
EPOCH[171/400] Elapsed time: 86.6 Neg_ll: 129988.29
...
` cd code/src && python main.py --dataset yelp2018 --topks=[20] --model VKDE --epoch 400 --tau_model2 0.1 --reg_model2 0.001 --dropout_model2 0.5 --lr 0.001 --enc_dims [64] --sampling 1 --cuda 0
all metrics is under top-20, shown in the paper
pytorch version results (stop at 400 epochs):
(for seed=2022)