AdaMCT: Adaptive Mixture of CNN-Transformer for Sequential Recommendation

This is the official Pytorch implementation for our CIKM 2023 paper: "AdaMCT: Adaptive Mixture of CNN-Transformer for Sequential Recommendation", which is based on a famous open-source framework RecBole.

Overview

Figure 1. AdaMCT Model Architecture.

About RecBole

RecBole is developed based on Python and PyTorch for reproducing and developing recommendation algorithms in a unified, comprehensive and efficient framework for research purpose. RecBole's library includes 78 recommendation algorithms, covering four major categories:

General Recommendation
Sequential Recommendation
Context-aware Recommendation
Knowledge-based Recommendation

RecBole designs a unified and flexible data file format, and provide the support for 28 benchmark recommendation datasets. A user can apply the provided script to process the original data copy, or simply download the processed datasets by RecBole's team.

Figure 2. RecBole Overall Architecture

Installation

RecBole works with the following operating systems:

Linux
Windows 10
macOS X

RecBole requires Python version 3.7 or later.

RecBole requires torch version 1.7.0 or later. If you want to use RecBole with GPU, please ensure that CUDA or cudatoolkit version is 9.2 or later. This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).

Install from conda

conda install -c aibox recbole

Install from pip

pip install recbole

Install from source

git clone https://github.com/RUCAIBox/RecBole.git && cd RecBole
pip install -e . --verbose

Quick-Start

With the source code, you can use the provided script for initial usage of RecBole's library:

For example, this script will run the BPR model on the ml-100k dataset by default.

python run_recbole.py

Typically, this example takes less than one minute. It will obtain the following output:

INFO ml-100k
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
INFO Evaluation Settings:
Group by user_id
Ordering: {'strategy': 'shuffle'}
Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}
Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
INFO BPRMF(
    (user_embedding): Embedding(944, 64)
    (item_embedding): Embedding(1683, 64)
    (loss): BPRLoss()
)
Trainable parameters: 168128
INFO epoch 0 training [time: 0.27s, train loss: 27.7231]
INFO epoch 0 evaluating [time: 0.12s, valid_score: 0.021900]
INFO valid result:
recall@10: 0.0073  mrr@10: 0.0219  ndcg@10: 0.0093  hit@10: 0.0795  precision@10: 0.0088
...
INFO epoch 63 training [time: 0.19s, train loss: 4.7660]
INFO epoch 63 evaluating [time: 0.08s, valid_score: 0.394500]
INFO valid result:
recall@10: 0.2156  mrr@10: 0.3945  ndcg@10: 0.2332  hit@10: 0.7593  precision@10: 0.1591
INFO Finished training, best eval result in epoch 52
INFO Loading model structure and parameters from saved/***.pth
INFO best valid result:
recall@10: 0.2169  mrr@10: 0.4005  ndcg@10: 0.235  hit@10: 0.7582  precision@10: 0.1598
INFO test result:
recall@10: 0.2368  mrr@10: 0.4519  ndcg@10: 0.2768  hit@10: 0.7614  precision@10: 0.1901

If you want to change the arguments, such as gpu_id, model, learning_rate, embedding_size, dataset, and config_files, just set the additional command arguments as you need:

python run_recbole.py\
        --model=[model_name]\
        --learning_rate=[0.0001]\
        --embedding_size=[128]\
        --dataset=['Amazon_Toys_and_Games']\
        --config_files=['config/Amazon_Toys_and_Games.yaml']

AdaMCT

Benchmarks Preparation

Please, download three widely used benchmarks from RecSysDatasets or Google Drive constructed by RecBole's team. Then, place the preprocessed files in ./dataset like the following.

dataset
├── Amazon_Beauty
│   ├── Amazon_Beauty.inter
│   └── Amazon_Beauty.item
├── Amazon_Toys_and_Games
│   ├── Amazon_Toys_and_Games.inter
│   └── Amazon_Toys_and_Games.item
└── Amazon_Sports_and_Outdoors
    ├── Amazon_Sports_and_Outdoors.inter
    └── Amazon_Sports_and_Outdoors.item

Training & Inference

Note that our experiments are conducted on a single NVIDIA A100 SXM4 80GB GPU. You can find our experimental logging files in ./exp_log, including AdaMCT-Amazon_Beauty-Jun-05-2023_01-47-32-ce1763, AdaMCT-Amazon_Sports_and_Outdoors-Jun-05-2023_04-49-07-5c1f03, and AdaMCT-Amazon_Toys_and_Games-Jun-05-2023_04-17-39-18a08b.

# Beauty
python run_recbole.py --gpu_id='0' --model=AdaMCT --dataset='Amazon_Beauty' --config_files='config/Amazon_Beauty.yaml'

# Toys
python run_recbole.py --gpu_id='1' --model=AdaMCT --dataset='Amazon_Toys_and_Games' --config_files='config/Amazon_Toys_and_Games.yaml' 

# Sports
python run_recbole.py --gpu_id='2' --model=AdaMCT --dataset='Amazon_Sports_and_Outdoors' --config_files='config/Amazon_Sports_and_Outdoors.yaml'

bash run_adamct.sh

The terminal output of final results will be like (we report test results in our paper):

Toys

Mon 05 Jun 2023 07:15:56 INFO  best valid : OrderedDict([('recall@1', 0.2533), ('recall@5', 0.4599), ('recall@10', 0.5609), ('recall@20', 0.6785), ('ndcg@1', 0.2533), ('ndcg@5', 0.3621), ('ndcg@10', 0.3947), ('ndcg@20', 0.4243)])
Mon 05 Jun 2023 07:15:56 INFO  test result: OrderedDict([('recall@1', 0.2184), ('recall@5', 0.4132), ('recall@10', 0.5088), ('recall@20', 0.6252), ('ndcg@1', 0.2184), ('ndcg@5', 0.3203), ('ndcg@10', 0.3511), ('ndcg@20', 0.3804)])

Beauty

Mon 05 Jun 2023 02:37:56 INFO  best valid : OrderedDict([('recall@1', 0.2519), ('recall@5', 0.4626), ('recall@10', 0.5621), ('recall@20', 0.6704), ('ndcg@1', 0.2519), ('ndcg@5', 0.363), ('ndcg@10', 0.3951), ('ndcg@20', 0.4224)])
Mon 05 Jun 2023 02:37:56 INFO  test result: OrderedDict([('recall@1', 0.2178), ('recall@5', 0.4207), ('recall@10', 0.5178), ('recall@20', 0.6314), ('ndcg@1', 0.2178), ('ndcg@5', 0.3241), ('ndcg@10', 0.3554), ('ndcg@20', 0.384)])

Sports

Mon 05 Jun 2023 08:40:20 INFO  best valid : OrderedDict([('recall@1', 0.222), ('recall@5', 0.469), ('recall@10', 0.6002), ('recall@20', 0.7392), ('ndcg@1', 0.222), ('ndcg@5', 0.3503), ('ndcg@10', 0.3926), ('ndcg@20', 0.4278)])
Mon 05 Jun 2023 08:40:20 INFO  test result: OrderedDict([('recall@1', 0.1862), ('recall@5', 0.4097), ('recall@10', 0.5412), ('recall@20', 0.6871), ('ndcg@1', 0.1862), ('ndcg@5', 0.3019), ('ndcg@10', 0.3444), ('ndcg@20', 0.3813)])

Auto-tuning Hyperparameter

Open ./hyper_adamct.test and set several hyperparameters to auto-searching in parameter list. The following has two ways to search best hyperparameter:

loguniform: indicates that the parameters obey the uniform distribution, randomly taking values from e^{-8} to e^{0}.
choice: indicates that the parameter takes discrete values from the setting list.

Here is our hyperpatermeter searching space for ./hyper_adamct.test:

learning_rate loguniform -8,0
hidden_size choice [16,32,64,96,128]
n_heads choice [2,4,8,16,32]
reduction_ratio choice [1,2,5,10,25]
kernel_size choice [3,5,7,9,11]
hidden_dropout_prob loguniform -1,0
attn_dropout_prob loguniform -1,0
hidden_act choice ["gelu","relu","swish","tanh","sigmoid"]

Set training command parameters as you need to run:

python run_hyper.py --model=[model_name] --dataset=[data_name] --config_files=xxxx.yaml --params_file=hyper.test
e.g.,
# Amazon Beauty
python run_hyper.py --gpu_id='0' --model=AdaMCT --dataset='Amazon_Beauty' --config_files='config/Amazon_Beauty.yaml' --params_file='hyper_adamct.test'

# Amazon Sports and Outdoors
python run_hyper.py --gpu_id='1' --model=AdaMCT --dataset='Amazon_Sports_and_Outdoors' --config_files='config/Amazon_Sports_and_Outdoors.yaml' --params_file='hyper_adamct.test'

# Amazon Toys and Games
python run_hyper.py --gpu_id='2' --model=AdaMCT --dataset='Amazon_Toys_and_Games' --config_files='config/Amazon_Toys_and_Games.yaml' --params_file='hyper_adamct.test'

Note that --config_files=test.yaml is optional, if you don't have any customize config settings, this parameter can be empty.

This processing maybe take a long time to output best hyperparameter and result:

e.g.,
running parameters:                                                                                                                    
{'embedding_size': 64, 'learning_rate': 0.005947474154838498, 'mlp_hidden_size': '[64,64,64]', 'train_batch_size': 512}                
  0%|                                                                                           | 0/18 [00:00<?, ?trial/s, best loss=?]

More information about hyperparameter tuning can be found in docs.

Light-weight and High-efficiency

Although many works in the literature have emphasized the importance of the Positional-wise Feed-forward Network (FFN) in Transformer, our AdaMCT first attempts to remove FFN, leading to a tremendous decline of parameters (76% ~ 93% less). We observe that it is attributed to the locality inductive bias injected into Transformer and the module and layer-aware adaptive mixture units to adaptively determine the mixing importance of the global attention mechanism and local convolutional filter on a personalized basis. On the other hand, it also provides an alternative backbone to significantly reduce the size of large language models (LLMs). We will explore it in our future work.

Table 1. Parameters number and execution efficiency analysis of SOTA models on Beauty dataset.

License

RecBole uses MIT License. All data and code in this project can only be used for academic purposes.

Citation

If you use the data or code in this repo, please cite the repo.

@article{jiang2022adamct,
  title={AdaMCT: adaptive mixture of CNN-transformer for sequential recommendation},
  author={Jiang, Juyong and Zhang, Peiyan and Luo, Yingtao and Li, Chaozhuo and Kim, Jae Boum and Zhang, Kai and Wang, Senzhang and Xie, Xing and Kim, Sunghun},
  journal={arXiv preprint arXiv:2205.08776},
  year={2022}
}

Naturally, you should also cite the original RecBole Framework.

juyongjiang/AdaMCT