/UniSRec

[KDD22] Official PyTorch implementation for "Towards Universal Sequence Representation Learning for Recommender Systems".

Primary LanguagePython

UniSRec

This is the official PyTorch implementation for the paper:

Yupeng Hou*, Shanlei Mu*, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen. Towards Universal Sequence Representation Learning for Recommender Systems. KDD 2022.


Updates:

  • [June 28, 2022] We updated some useful "mid product" files that can be obtained during the data preprocessing stage [link], including:
    1. Clean item text (*.text);
    2. Index mapping between raw IDs and remapped IDs (*.user2index, *.item2index);
  • [June 16, 2022] We released the code and scripts for preprocessing ours datasets [link].

Overview

We propose UniSRec, which stands for Universal Sequence representation learning for Recommendation. Aiming to learn more generalizable sequence representations, UniSRec utilizes the associated description text of an item to learn transferable representations across different domains and platforms. For learning universal item representations, we design a lightweight architecture based on parametric whitening and mixture-of-experts enhanced adaptor. For learning universal sequence representations, we introduce two kinds of contrastive learning tasks by sampling multi-domain negatives. With the pre-trained universal sequence representation model, our approach can be effectively transferred to new cross-domain and cross-platform recommendation scenarios in a parameter-efficient way, under either inductive or transductive settings.

Requirements

recbole==1.0.1
python==3.9.7
cudatoolkit==11.3.1
pytorch==1.11.0

Download Datasets and Pre-trained Model

Please download the processed downstream (or pre-trained, if needed) datasets and the pre-trained model from Google Drive or 百度网盘 (密码 3cml).

After unzipping, move pretrain/ and downstream/ to dataset/, and move UniSRec-FHCKM-300.pth to saved/.

Quick Start

Train and evaluate on downstream datasets

Fine-tune the pre-trained UniSRec model in transductive setting.

python finetune.py -d Scientific -p saved/UniSRec-FHCKM-300.pth

You can replace Scientific to Pantry, Instruments, Arts, Office or OR to reproduce the results reported in our paper.

Fine-tune the pre-trained model in inductive setting.

python finetune.py -d Scientific -p saved/UniSRec-FHCKM-300.pth --train_stage inductive_ft

Train UniSRec from scratch (w/o pre-training).

python finetune.py -d Scientific

Run baseline SASRec.

python run_baseline.py -m SASRec -d Scientific --config_files props/finetune.yaml --hidden_size=300

Pre-train from scratch

Pre-train on one single GPU.

python pretrain.py

Pre-train with distributed data parallel on GPU:0-3.

CUDA_VISIBLE_DEVICES=0,1,2,3 python ddp_pretrain.py

Customized Datasets

Please refer to [link] for details of data preprocessing. Then you can correspondingly try your customized datasets.

Acknowledgement

The implementation is based on the open-source recommendation library RecBole.

Please cite the following papers as the references if you use our codes or the processed datasets.

@inproceedings{hou2022unisrec,
  author = {Yupeng Hou and Shanlei Mu and Wayne Xin Zhao and Yaliang Li and Bolin Ding and Ji-Rong Wen},
  title = {Towards Universal Sequence Representation Learning for Recommender Systems},
  booktitle = {{KDD}},
  year = {2022}
}


@inproceedings{zhao2021recbole,
  title={Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms},
  author={Wayne Xin Zhao and Shanlei Mu and Yupeng Hou and Zihan Lin and Kaiyuan Li and Yushuo Chen and Yujie Lu and Hui Wang and Changxin Tian and Xingyu Pan and Yingqian Min and Zhichao Feng and Xinyan Fan and Xu Chen and Pengfei Wang and Wendi Ji and Yaliang Li and Xiaoling Wang and Ji-Rong Wen},
  booktitle={{CIKM}},
  year={2021}
}

Special thanks @Juyong Jiang for the excellent DDP implementation (#961).