amirj/recsys_eval

How Sensitive is Recommendation Systems' Offline Evaluation to Popularity?

Jupyter NotebookApache-2.0

How Sensitive is Recommendation Systems' Offline Evaluation to Popularity

This is the implementation of the following paper:

@InProceedings{recsys_eval19,
  author    = {Amir H. Jadidinejad and Craig Macdonald and Iadh Ounis},
  title     = {How Sensitive is Recommendation Systems' Offline Evaluation to Popularity?},
  booktitle = {In Workshop on Offline Evaluation for Recommender Systems (REVEAL2019) at the 13th ACM Conference on Recommender Systems.},
  year      = {2019},
}

Requirements

pytorch (1.0.1)
spotlight (0.1.5)
pytrec-eval (0.3)

Results

The following plot summarizes the results of popularity-stratified sampling:

By setting P threshold to maximum, evaluation of models is corresponding to the offline recommendation system's evaluation:

See the paper or our poster for more details.

How to reproduce?

Use the corresponding Jupyter notebook to reproduce the results of each dataset (MovieLens, Amazon) for a specific popularity threshold P: