/collaborative-filtering

A package for collaborative filtering recommendation.

Primary LanguagePythonMIT LicenseMIT

Collaborative Filtering

A package for collaborative filtering recommendation.

Currently supported models

  • MF: matrix factorization
  • MLP: multilayer perceptron

Currently supported loss types

  • BCE: binary cross-entropy
  • CE: cross-entropy (pseudo multiclass classification)
  • BPR: Bayesian personalized ranking
  • GBPR: group Bayesian personalized ranking

Requirements

  • Linux-based OS
  • Python 3.6+

Get started

Install the package

Install from https://pypi.org/:

pip install collaborative-filtering

Or install it manually:

git clone https://github.com/yusanshi/collaborative-filtering.git
cd collaborative-filtering
pip install .

Prepare the dataset

You should specify the dataset by passing the --dataset_path parameter. The parameter value must be the directory path with train.tsv, valid.tsv and test.tsv in it. Each file must be a two-columns TSV file with user and item as headers, user indexes and item indexes (both are 0-based) as body. An example:

user	item
0	241
0	149
0	76
...

Based on the datasets used in GBPR, we made some small changes on data format so you can directly use them in this project. Download and uncompress them with:

mkdir data && cd data
wget https://github.com/yusanshi/collaborative-filtering/files/7052355/dataset.tar.gz
tar -xzvf dataset.tar.gz

Besides the --dataset_path parameter, you should also provide the --user_num and --item_num parameters. For users, make sure this formula holds: user_num >= max(max(training user indexes), max(validation user indexes), max(test user indexes)) + 1. The same for items. Based on the formulas, you can write a simple script to get the user_num and item_num for a dataset.

For Quick Start users: If you use the datasets provided by us, for ML100K dataset, they are --user_num 943 --item_num 1682.

Run

python -m collaborative_filtering.train \
  --user_num USER_NUM \
  --item_num ITEM_NUM \
  --negative_sampling_ratio NEGATIVE_SAMPLING_RATIO \
  --model_name {MF,MLP} \
  --loss_type {BCE,CE,BPR,GBPR} \
  --dataset_path DATASET_PATH \
  ...

Here we only list the most important parameters. For more details refer to python -m collaborative_filtering.train -h and collaborative_filtering/parameters.py.

The stdout log and TensorBoard log are located in os.path.join(log_path, f'{model_name}-{loss_type}-{dataset_path}') and os.path.join(tensorboard_runs_path, f'{model_name}-{loss_type}-{dataset_path}'), respectively. To visualize metrics with TensorBoard, run:

tensorboard --logdir RUNS_PATH

TODO

  • More models
  • More loss types
  • Test
  • Documentation