/MULE

MuLe: Multi-Grained Graph Learning for Multi-Behavior Recommendation (CIKM 2024)

Primary LanguagePython

MuLe

This is the official code for MuLe (Multi-Grained Graph Learning for Multi-Behavior Recommendation), accepted to CIKM 2024. overview

Prerequisties

You can install the required packages with a conda environment by typing the following command in your terminal:

conda create -n MULE python=3.9
conda activate MULE
pip install -r requirements.txt

Datasets

The statistics of datasets used in MuLe are summarized as follows. The percentage of each auxiliary behavior is the ratio of target intersected interactions (e.g., view and buy).

Dataset Users Items Views Collects Carts Buys
Taobao 15,449 11,953 873,954 (9%) - 195,476 (10%) 92,180
Tmall 41,738 11,953 1,813,498 (12%) 221,514 (12%) 1,996 (15%) 255,586
Jdata 93,334 24,624 1,681,430 (16%) 45,613 (43%) 49,891 (57%) 321,883

We gathered Tmall and Jdata datasets from CRGCN and Taobao dataset from MBCGCN. To preprocess the datasets for use in our code, type the following command:

python ./data/preprocess.py

Usage

Train our model from scratch

You can train the model with the best hyperparameters for each dataset by typing the following command in your terminal:

Train MuLE in the Taobao dataset

python ./src/main.py --dataset taobao \
                     --lr 1e-4 \
                     --weight_decay 0 \
                     --tda_layers 4 \
                     --gnn_layers 1 \
                     --emb_dim 64 \
                     --num_epochs 100 \
                     --batch_size 1024

Train MuLE in the Tmall dataset

python ./src/main.py --dataset tmall \
                     --lr 5e-4 \
                     --weight_decay 1e-5 \
                     --tda_layers 5 \
                     --gnn_layers 1 \
                     --emb_dim 64 \
                     --num_epochs 100 \
                     --batch_size 1024

Train MuLE in the Jdata dataset

python ./src/main.py --dataset jdata \
                     --lr 1e-3 \
                     --weight_decay 0 \
                     --tda_layers 5 \
                     --gnn_layers 1 \
                     --emb_dim 64 \
                     --num_epochs 100 \
                     --batch_size 1024

Use a pre-trained MuLe

Because of the volume limit of the github, we provide the pre-trained parameters of MuLe for each dataset in google drive. You can download the pre-trained files as follows:

gdown --folder https://drive.google.com/drive/folders/1L5wnVGQ6EhBy9wyPJTOxsU9JD7zIIvrc 

After downloading the pre-trained parameters, you can run the pre-trained model by adding --load_checkpoint option to the above training command.

Result of Pre-trained MuLe

The test performance of the pre-trained MuLE for each dataset is as follows:

Dataset HR@10 NDCG@10
Taobao 0.1939 0.1109
Tmall 0.2109 0.1165
Jdata 0.5820 0.4147

All experiments are conducted on RTX 4090 (24GB) with cuda version 11.8, and the above results were reproduced with the random seed seed=42.

The reported results in the paper are as follows:

HR@10 Taobao Tmall Jdata
LightGCN 0.0411 0.0393 0.2252
CRGCN 0.0855 0.0840 0.5001
MB-CGCN 0.1233 0.0984 0.4349
HPMR 0.1104 0.0956 -
PKEF 0.1385 0.1277 0.4334
MB-HGCN 0.1299 0.1443 0.5406
MuLE 0.1918 0.2112 0.5889
% diff 38.5% 44.6% 10.3%
NDCG@10 Taobao Tmall Jdata
LightGCN 0.0240 0.0209 0.1436
CRGCN 0.0439 0.0442 0.2914
MB-CGCN 0.0677 0.0558 0.2758
HPMR 0.0599 0.0515 -
PKEF 0.0785 0.0721 0.2615
MB-HGCN 0.0690 0.0769 0.3555
MuLE 0.1103 0.1177 0.4061
% diff 40.5% 52.9% 25.4%

Validated hyperparameters of MuLe

We provide the validated hyperparameters of MuLe for each dataset to ensure reproducibility.

Dataset $\eta$ $\lambda$ $L_{\texttt{tda}}$ $L_{\texttt{light}}$ $d$ $T$ $B$
Taobao 1e-4 0 4 1 64 100 1024
Tmall 5e-4 1e-5 5 1 64 100 1024
Jdata 1e-3 0 5 1 64 100 1024

Description of each hyperparameter

  • $\eta$: learning rate of the Adam optimizer (--lr)
  • $\lambda$: weight decay for L2-regularization (--weight_decay)
  • $L_{\texttt{tda}}$: number of TDA's layers (--tda_layers)
  • $L_{\texttt{light}}$: number of LightGCN's layers (--gnn_layers)
  • $d$: embedding dimension (--emb_dim)
  • $T$: number of epochs (--num_epochs)
  • $B$: batch size for target data (--batch_size)

Detailed Options

You can train and evaluate your own dataset with custom hyperparameters as follows:

Option Description Default
dataset dataset name taobao
data_dir data directory path ./data
checkpoint_dir checkpoint directory path ./checkpoint
load_checkpoint whether to load the configuration used in a pre-trained model False
batch_size batch size for target data 1024
lr learning rate 0.0001
weight_decay strength $\lambda$ of L2 regularization 0.00001
gnn_layers number of LightGCN layers 1
tda_layers number of TDA layers 4
emb_dim embedding dimension 64
num_epochs number of epochs 100
seed random seed; If None, the seed is not fixed 42
device training device cuda:0
topk Top-k items 10