Official source code for AAAI 2023 paper: Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation
In this paper, we explored the impact of time interval on sequential recommendations. Our basic idea is that uniform sequences are more valuable for next-item prediction. This assumption was validated by an empirical study. Then, we proposed five data operators to augment item sequences in the light of time intervals. Our experiments on four public datasets have verified the effectiveness of our proposed operators for data augmentation. To the authors' best knowledge, this is the first work to study the distribution of time interval for the sequential recommendation. For future work, we intend to further consider the factor of item category for data augmentation, and how time interval and item category can be leveraged together for better performance.
Python >= 3.7
torch == 1.11.0+cu113 (We haven't tested the code on the lower version of torch)
numpy == 1.20.1
gensim = 4.2.0
tqdm == 4.59.0
pandas == 1.2.4
-
Processed Beauty, Sports and Home and two different version of Yelp datasets are included in
data
folder.XXX_org_rank: users will maintain the original order.
XXX_var_rank: users will be ranked by the variance of the interaction time interval.
-
You can use the code in
data_process
folder to process your own dataset, and we explained its role at beginning of each code file. -
For Yelp dataset, we give two processed datasets from two different versions of Yelp. Yelp-A is processed from Yelp2020, which has 316,354 interactions. Yelp-B is processed from Yelp2022, which has 207,045 interactions.
-
Delete
_rank_org
or_rank_var
in the file name.Example: If you want to use the Sports dataset ranked by variance, change the
Sports_item_var_rank.txt
intoSports_item.txt
, change theSports_time_var_rank.txt
intoSports_time.txt
. -
Change to
src
folder and Run the following command. (The program will read the data file according to [DATA_NAME]. [Model_idx] and [GPU_ID] can be specified according to your needs)python main.py --data_name=[DATA_NAME] --model_idx=[Model_idx] --gpu_id=[GPU_ID]
Example: python main.py --data_name=Beauty --model_idx=1 --augmentation_warm_up_epochs=350 --mask_mode=maximum --substitute_rate=0.2 --crop_rate=0.4 --mask_rate=0.7 --reorder_rate=0.5 --gpu_id=0 python main.py --data_name=Sports --model_idx=1 --substitute_rate=0.2 --weight_decay=1e-5 --patience=100 --gpu_id=0 python main.py --data_name=Home --model_idx=1 --mask_mode=random --reorder_rate=0.4 --mask_rate=0.6 --patience=50 --weight_decay=1e-7 --gpu_id=0
-
The code will output the training log, the log of each test, and the
.pt
file of each test. You can change the test frequency insrc/main.py
. -
The meaning and usage of all other parameters have been clearly explained in
src/main.py
. You can change them as needed.
If you use your own dataset, we give some suggestions and ranges for fine-tuning of Hyper-parameters.
- augment_threshold: it needs to be adjusted according to the dataset.
- augment_type_for_short: generally,
SIM
is better. You can try other operator combinations. - ratio/rate for data augmentation operators: range
[0.1,0.9]
step by0.1
or0.2
. - var_rank_not_aug_ratio: range
[0.1,0.5]
step by0.1
or0.05
. - attn_dropout_prob and hidden_dropout_prob : range
[0.2,0.5]
step by0.1
. - weight_decay : range
[1e-4,1e-8]
.
-
Change to
src
folder, Move the.pt
file to thesrc/output
folder. We give the weight file of the Beauty, Sports and Home dataset. -
Run the following command.
python main.py --data_name=[DATA_NAME] --eval_path=[EVAL_PATH] --do_eval --gpu_id=[GPU_ID]
Example: python main.py --data_name=Beauty --eval_path=./output/Beauty.pt --do_eval --gpu_id=0 python main.py --data_name=Sports --eval_path=./output/Sports.pt --do_eval --gpu_id=0 python main.py --data_name=Home --eval_path=./output/Home.pt --do_eval --gpu_id=0 Beauty Results: {'stage': 'test', 'epoch': 0, 'HIT@5': '0.0504', 'NDCG@5': '0.0343', 'HIT@10': '0.0740', 'NDCG@10': '0.0418', 'HIT@20': '0.1068', 'NDCG@20': '0.0501'} Sports Results: {'stage': 'test', 'epoch': 0, 'HIT@5': '0.0334', 'NDCG@5': '0.0227', 'HIT@10': '0.0514', 'NDCG@10': '0.0284', 'HIT@20': '0.0768', 'NDCG@20': '0.0348'} Home Results: {'stage': 'test', 'epoch': 0, 'HIT@5': '0.0182', 'NDCG@5': '0.0127', 'HIT@10': '0.0266', 'NDCG@10': '0.0154', 'HIT@20': '0.0390', 'NDCG@20': '0.0185'}
Thanks them for providing efficient implementation.
Please cite our paper if you use this code.
@article{dang2022uniform,
title={Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation},
author={Dang, Yizhou and Yang, Enneng and Guo, Guibing and Jiang, Linying and Wang, Xingwei and Xu, Xiaoxiao and Sun, Qinghui and Liu, Hong},
journal={arXiv preprint arXiv:2212.08262},
year={2022}
}