/session-rec

Python-based framework for building and evaluating session-based recommender systems.

Primary LanguagePython

session-rec

Introduction

session-rec is a Python-based framework for building and evaluating recommender systems (Python 3.5.x). It implements a suite of state-of-the-art algorithms and baselines for session-based recommendation.

Parts of the framework and its algorithms are based on code developed and shared by:

  • Rendle et al., BPR: Bayesian Personalized Ranking from Implicit Feedback, UAI 2009. (Original Code).
  • Mi et al., Context Tree for Adaptive Session-based Recommendation, 2018. (Code shared by the authors).
  • Hidasi et al., Recurrent Neural Networks with Top-k Gains for Session-based Recommendations, CoRR abs/1706.03847, 2017. (Original Code).
  • Liu et al., STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation, KDD 2018. (Original Code).
  • Li et al., Neural Attentive Session-based Recommendation, CIKM 2017. (Original Code).
  • Yuan et al., A Simple but Hard-to-Beat Baseline for Session-based Recommendations, CoRR abs/1808.05163, 2018. (Code shared by the authors).
  • Rendle et al., Factorizing Personalized Markov Chains for Next-basket Recommendation. WWW 2010. (Original Code).
  • Kabbur et al., FISM: Factored Item Similarity Models for top-N Recommender Systems, KDD 2013. (Original Code).
  • He and McAuley. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. CoRR abs/1609.09152, 2016. (Original Code).

Requirements

To run session-rec, the following libraries are required:
  • Anaconda 4.X (Python 3.5+)
  • Pympler
  • NumPy
  • SciPy
  • BLAS
  • Sklearn
  • Dill
  • Pandas
  • Theano
  • Pyyaml
  • CUDA
  • Tensorflow
  • Theano
  • Psutil
  • Python-telegram-bot

Installation

Using Anaconda (Windows users)

  1. Download and Install Anaconda (https://www.anaconda.com/distribution/)
  2. Run the following commands:
    1. git clone https://github.com/kyraropmet/session-rec.git
    2. From the main folder run:
    3. conda install --yes --file requirements_conda.txt
    4. pip install -r requirements_pip.txt

Using docker (Linux users with a GPU that supports Cuda9)

  1. Download and Install Docker (https://www.docker.com/)
  2. Run the following commands:
    1. docker pull 042019/session-rec-docker
    2. git clone https://github.com/kyraropmet/session-rec.git

Example of Experiments

The data folder contains a small sample dataset. It's possible to have an overview of how the framework works by using as a configuration file:
  • example_next.yml to predict the next item in the session.
  • example_multiple.yml to predict the remaining items of the session.
At the end of the experiments, you can find the evalutaion results in the "results" folder. You can also find the list of recommended items under the "results" folder with the suffix "Saver@".

How to Run It

  1. Dataset preprocessing
    1. Unzip any dataset file to the data folder, i.e., rsc15-clicks.dat will then be in the folder data/rsc15/raw
    2. Open and edit any configuration file in the folder conf/preprocess/.. to configure the preprocessing method and parameters.
      • See, e.g., conf/preprocess/window/rsc15.yml for an example with comments.
    3. Run a configuration with the following command:
      ./dpython run_preprocesing.py conf/preprocess/window/rsc15.yml

  2. Run experiments using the configuration file
    1. Create folders conf/in and conf/out. Configure a configuration file *.yml and put it into the folder named conf/in. Examples of configuration files are listed in the conf folder. It is possible to configure multiple files and put them all in the conf/in folder. When a configuration file in conf/in has been executed, it will be moved to the folder conf/out.
    2. Using Anaconda:
      Run the following command from the main folder:
      python run_config.py conf/in conf/out
      If you want to run a specific configuration file, run the following command:
      python run_config.py conf/example_next.yml
    3. Using Docker:
      Run the following command from the main folder:
      ./dpython run_config.py conf/in conf/out
      If you want to run a specific configuration file, run the following command:
      ./dpython run_config.py conf/example_next.yml
    4. Results and run times will be displayed and saved to the results folder as config.

How to Configure It

Start from one of the examples in the conf folder.

Essential Options

Entry Example Description
type single Values: single (one single training-test split), window (sliding-window protocol), opt (parameters optimization).
evaluation evaluation_multiple Values: evaluation (evaluation in term of the next item), evaluation_last (evaluation in term of the last item of the session), evaluation_multiple (evaluation in terms of the remaining items of the sessions).
slices 5 Number of slices for the window protocol.
opts opts: {sessions_test: 10} Number of sessions used as a test during the optimization phase.
metrics -class: accuracy.HitRate
length: [5,10,15,20]
List of accuracy measures (HitRate, MRR, Precision, Recall, MAP, Coverage, Popularity, Time_usage_training, Time_usage_testing, Memory_usage). If you want to save the files with the recommedation lists use the option:
- class: saver.Saver
length: [50]
It's possible to use the saved recommendations using the ResultFile class.
opts opts: {sessions_test: 10} Number of session used as a test during the optimization phase.
optimize class: accuracy.MRR
length: [20]
iterations: 100 #optional
Measure to which optimize the parameters.
algorithms - See the example.yml, example_opt.yml and example_hybrid_opt.yml for a complete list of the algorithms and their parameters.

How to extend it

  1. Make your new algorithm class.
  2. Write the following functions:
    • __init__()
    • fit()
    • predict_next()
    • clear()
Tip: look at the implementation of a baseline (e.g.: ar.py).

Algorithms

Baselines

Algorithm File Description
Association Rules ar.py Simplified version of the association rule mining technique with a maximum rule size of two.
Markov Chains markov.py Variant of association rules with a focus on sequences in the data. The rules are extracted from a first-order Markov Chain.
Sequential Rules sr.py Variation of mc or ar respectively. It also takes the order of actions into account, but in a less restrictive manner.
BPR-MF bpr.py Rendle et al., BPR: Bayesian Personalized Ranking from Implicit Feedback, UAI 2009.
Context Tree ct.py Mi et al., Context Tree for Adaptive Session-based Recommendation, 2018.

Nearest Neighbors

Algorithm File Description
Item-based kNN iknn.py Considers the last element in a given session and then returns those items as recommendations that are most similar to it in terms of their co-occurrence in other sessions.
Session-based kNN sknn.py Recommend items from the most similar sessions, where session distance is determined with the cosine similarity function or the jaccard index.
Vector Multiplication Session-Based kNN vsknn.py More emphasis on the more recent events of a session when computing the similarities. The weights of the other elements are determined using a linear decay function that depends on the position of the element within the session, where elements appearing earlier in the session obtain a lower weight.

Neural Networks

Algorithm File Description
Gru4Rec gru4rec.py Hidasi et al., Recurrent Neural Networks with Top-k Gains for Session-based Recommendations, CoRR abs/1706.03847, 2017.
STAMP STAMP.py Liu et al., STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation, KDD 2018.
NARM narm.py Li et al., Neural Attentive Session-based Recommendation, CIKM 2017.
NextItNet nextitrec.py Yuan et al., A Simple but Hard-to-Beat Baseline for Session-based Recommendations, CoRR abs/1808.05163, 2018.

Factorization-based Methods

Algorithm File Description
Factorized Personalized Markov Chains fpmc.py Rendle et al., Factorizing Personalized Markov Chains for Next-basket Recommendation. WWW 2010.
Factored Item Similarity Models fism.py Kabbur et al., FISM: Factored Item Similarity Models for top-N Recommender Systems, KDD 2013.
Factorized Sequential Prediction with Item Similarity Models fossil.py He and McAuley. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. CoRR abs/1609.09152, 2016.
Session-based Matrix Factorization smf.py It combines factorized Markov chains with classic matrix factorization. In addition, the method considers the cold-start situation of session-based recommendation scenarios.

Related Datasets

Datasets can be downloaded from: https://www.dropbox.com/sh/n281js5mgsvao6s/AADQbYxSFVPCun5DfwtsSxeda?dl=0

RSC15 The e-commerce dataset used in the 2015 ACM RecSys Challenge.
RETAILROCKET An e-commerce dataset from the company Retail Rocket.
ZALANDO A private dataset consisting of interaction logs from a European fashion retailer.
NOWPLAYING Music listening logs obtained from Twitter.
30MUSIC Music listening logs obtained from Last.fm.
AOTM A public music dataset containing hand-crafted music playlists.
8TRACKS A private music dataset with hand-crafted playlists.

Statistics

Dataset RSC15-S RSC15 TMALL RETAILROCKET ZALANDO
Actions 31,708,461 5,426,961 13,418,695 212,182 4,536,950
Sessions 7,981,581 1,375,128 1,774,729 59,962 365,126
Items 37,483 28,582 425,348 31,968 189,328
Timespan in Days 182 31 90 27 90
Actions per Session 3.97 3.95 7.56 3.54 12.43
Unique Items per Session 3.17 3.17 5.56 2.56 8.39
Actions per Day 174,222.31 175,063.26 149,096.61 7,858.59 50,410.56
Sessions per Day 43,854.84 44,358.97 19,719.22 2220.84 4056.96
Dataset 8TRACKS 30MUSIC AOTM NOWPLAYING  CLEF
Actions 1,499,645 638,933 306,830 271,177 5,540,486
Sessions 132,453 37,333 21,888 27,005 1,644,442
Items 376,422 210,633 91,166 75,169 742
Timespan in Days 90 90 90 90 6
Actions per Session 11.32 17.11 14.02 10.04 3.37
Unique Items per Session 11.31 14.47 14.01 9.38 3.17
Actions per Day 16,662.72 7099,26 3,409.22 3,013.08 923,414
Sessions per Day 1,471.70 414.81 243.20 300.06 274,074