How to apply NCF to datasets that only have the number of interactions?
sgaseretto opened this issue · 9 comments
As the question states, how could this be applied to a dataset that only has the number of interactions between an user and the item? Movielens has the ratings in the, which is explicit feedback, but how could this model be applied to a dataset like the audioscrobbler dataset which has as implicit-feedback the number of times a user heard an artist? Here is an example of recommendations implementing ALS and using that dataset: http://www.gousios.gr/courses/bigdata/audioscrobbler.html
I think NCF addresses implicit feedback as well. For dataset only containing interactions between users and items, you can try to use BPR(Bayesian Personalized Ranking) criterion as the loss function. Specifically, the existing interactions are positive samples while the negative ones are sampled manually.
According to the origin paper, NCF is proposed to deal with implicit feedback. May I ask why this repo used normalization to process ratings?
Negative items get 0 ratings in this repo. And I normalized the ratings into [0, 1]. I think it is fine if you do not normalize the rating. But it might be hard to tune the hyper-params then.
Negative items get 0 ratings in this repo. And I normalized the ratings into [0, 1]. I think it is fine if you do not normalize the rating. But it might be hard to tune the hyper-params then.
I take a look at the implementation by the authors and found the function here. They just set all negative items to 0 and all other items interacted with users to 1 rather normalizing them (implicit feedback V.S. explicit feedback).
@RuihongQiu Thank you for reporting the bug. I added support for implicit feedback in the latest commit. Could you check if it works well? I only tested the GMF
.
@LaceyChen17 I will check it out soon.
Thanks a lot! I think the new code works.
I have checked all the experiments with new rating settings.
The first two experiments are actually explicit feedback with normalization on ratings.
Filenames ended with "implicit" are the result of the newest commit.
I also implement a new binarize method which just works as how "_normalize" works. It avoids the many lines change of codes compared to the newest version.
The results of binarize methods are filenames ended with "binarize".
If you would'n mind, I can pull a request.
Pull requests are extremely welcome!!! BTW Could you also share your training curves by updating README.md ?
I've opened a pull request for update.
Merged. Thank you so much!