/MM23-MISSRec

The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM'23).

Primary LanguagePython

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

[toc]

1. Introduction

This repository provides the code for our paper at ACM MM 2023:

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia. πŸ“[Paper]. πŸ–ΌοΈ[Poster]. πŸ“Ί[2-min Video]. πŸ‡¨πŸ‡³[中文解读 (PaperWeekly)].

We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we first design a clustering-based interest discovery algorithm to mine users' interests from their multi-modal behaviors. Then, we build a Transformer-based encoder-decoder model, where the encoder learns to capture personalization cues from interest tokens while the decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Experiments demonstrate the effectiveness and flexibility of MISSRec, indicating a practical solution for real-world recommendation scenarios.

In the following, we will guide you how to use this repository step by step. πŸ€—

2. Preparation

git clone https://github.com/gimpong/MM23-MISSRec.git
cd MM23-MISSRec/

2.1 Requirements

  • cuda 11.7
  • python 3.7.8
  • pytorch 1.13.1
  • numpy 1.21.6
  • cupy 11.6.0
  • tqdm 4.64.1

2.2 Data Preparation

Before running the code, we need to make sure that everything needed is ready. The working directory is expected to be organized as below:

MM23-MISSRec/
  • misc/
  • data/
  • reference_log/
  • props/
  • recbole/
  • torchpq/
  • saved/
    • MISSRec-FHCKM_mm_full-10.pth
    • MISSRec-FHCKM_mm_full-20.pth
    • ...
    • MISSRec-FHCKM_mm_full-100.pth
  • datasets/
    • pretrain/
      • FHCKM_mm_full/
    • downstream/
      • Scientific_mm_subset/
      • Scientific_mm_full/
      • Pantry_mm_subset/
      • Pantry_mm_full/
      • Office_mm_subset/
      • Office_mm_full/
      • Instruments_mm_subset/
      • Instruments_mm_full/
      • Arts_mm_subset/
      • Arts_mm_full/
        • Arts_mm_full.feat1CLS
        • Arts_mm_full.feat3CLS
        • Arts_mm_full.text
        • Arts_mm_full.item2index
        • Arts_mm_full.user2index
        • Arts_mm_full.test.inter
        • Arts_mm_full.train.inter
        • Arts_mm_full.valid.inter
  • scripts/
    • run01.sh
    • run02.sh
    • ...
  • cluster_utils.py
  • config.py
  • ddp_finetune.py
  • ddp_pretrain.py
  • finetune.py
  • missrec.py
  • model_utils.py
  • trainer.py
  • utils.py

Notes

  • The pre-processed dataset with extracted features can be downloaded from Google Drive. For each sub-dataset (e.g., Arts_mm_full), text and image features are saved in files named with suffixes ".feat1CLS" and ".feat3CLS", respectively, e.g., Arts_mm_full.feat1CLS and Arts_mm_full.feat3CLS. "subset" means the filtered subset of "full" that removes the items with incomplete modalities and only retains the full-modality items.

  • Customized feature extraction: We use the pre-trained CLIP-ViT-B/32 as the feature extractor for texts and images. You may want to use other feature extractors for the raw data. The raw text information can be obtained from the review data of the Amazon dataset. For the raw images, you can either crawl them according to URLs or download the version we crawled via Baidu Cloud (password: 791e).

  • Customized datasets: First, pre-process the user-item interaction data according to the instructions. Then you may use the pre-trained CLIP-ViT-B/32 to extract multi-modal item features.

  • saved/MISSRec-FHCKM_mm_full-*0.pth are checkpoint files, which will be generated during the pre-training (See below).

3. Pre-training

To pre-train the model for 100 epochs, run the following command in a multi-GPU environment:

# an example: pre-training on 4 GPUs
CUDA_VISIBLE_DEVICES="0,1,2,3" python ddp_pretrain.py

We have provided pre-trained checkpoints on Google Drive.

4. Fine-tuning or From-scratch Training in Downstream Datasets

For ease of usage, we provide the scripts with configurations for each experiment. These scripts can be found under the scripts/ folder. For example, if you want to fine-tune the pre-trained checkpoint on the Scientific dataset, you can do

cd scripts/
# '0' is the id of GPU
bash run01.sh 0

The script run01.sh includes the running commands:

#!/bin/bash
cd ..
CUDA_VISIBLE_DEVICES=$1 python finetune.py \
    -d Scientific_mm_full \
    -mode transductive
cd -

4.1 Main results on downstream domains

Script Dataset With ID? Pre-trained? Log R@10 N@10 R@50 N@50
run01.sh Scientific βœ“ βœ— log01 0.1282 0.0711 0.2376 0.0946
run02.sh βœ“ log02 0.136 0.0753 0.2431 0.0983
run03.sh βœ— βœ— log03 0.1269 0.0659 0.2354 0.0891
run04.sh βœ“ log04 0.1278 0.0658 0.2375 0.0893
run05.sh Pantry βœ“ βœ— log05 0.0771 0.0363 0.1804 0.0583
run06.sh βœ“ log06 0.0779 0.0365 0.1875 0.0598
run07.sh βœ— βœ— log07 0.0715 0.0337 0.1801 0.0569
run08.sh βœ“ log08 0.0771 0.0345 0.1833 0.0571
run09.sh Instruments βœ“ βœ— log09 0.1292 0.0842 0.2369 0.1072
run10.sh βœ“ log10 0.13 0.0843 0.237 0.1071
run11.sh βœ— βœ— log11 0.1207 0.0771 0.2191 0.0981
run12.sh βœ“ log12 0.1201 0.0771 0.2218 0.0988
run13.sh Arts βœ“ βœ— log13 0.1279 0.0744 0.2387 0.0982
run14.sh βœ“ log14 0.1314 0.0767 0.241 0.1002
run15.sh βœ— βœ— log15 0.1107 0.0641 0.2093 0.0853
run16.sh βœ“ log16 0.1119 0.0625 0.21 0.0836
run17.sh Office βœ“ βœ— log17 0.1269 0.0848 0.2001 0.1005
run18.sh βœ“ log18 0.1275 0.0856 0.2005 0.1012
run19.sh βœ— βœ— log19 0.1072 0.0694 0.1726 0.0834
run20.sh βœ“ log20 0.1038 0.0666 0.1701 0.0808

5. References

If you find this code useful or use the toolkit in your work, please consider citing:

@inproceedings{wang23missrec,
  title={MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation},
  author={Jinpeng Wang and Ziyun Zeng and Yunxiao Wang and Yuting Wang and Xingyu Lu and Tianxiang Li and Jun Yuan and Rui Zhang and Haitao Zheng and Shu-Tao Xia},
  booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023}
}

6. Acknowledgements

Our code is based on the implementation of UniSRec and TorchPQ.

7. Contact

If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.