- Run
pip3 install -r requirements.txt
to install necessary Python packages.
- Run
cd pretrain_data; python3 preprocess.py -d -m -s; cd ..
for pretrain data pre-processing, where-d
,-m
,-s
are for data downloading, meta data extraction, and interaction sequence generation respectively. - Run
cd finetune_data; python3 preprocess.py -d -m -s; cd ..
for finetune data pre-processing, where-d
,-m
,-s
are for data downloading, meta data extraction, and interaction sequence generation respectively.
- Run
bash 1-pretrain.sh
to perform pre-training. - Run
python 2-convert_pretrained_ckpt.py
to transform the lightning framework model to torch model. - Run
bash 3-finetune.sh
to perform fine-tuning on 6 datasets specified in the original work.
pretrain_data/preprocess.py
is to pre-process the pre-training dataset.finetune_data/preprocess.py
is to pre-process the fine-tuning datasets.lightning_dataloader.py
is the dataloader for pre-training.dataloader.py
is the dataloader for fine-tuning.collator.py
is to collects and processes data into batches and the output can be directly feed to pretraining finetuning evaluation and testing the model.recformer/tokenization.py
tokenizes the item sequences by token ids, token position ids, token type ids item position ids.recformer/models.py
is the implementation of the base model with 4 embedding layers extended from Longformer, and the models for pretraining and prediction.lightning_litwrapper.py
is the lightning module wrapper for performing easier pretraining with torch_lightning.lightning_pretrain.py
is the script to perform pretraining with torch_lightning framework.finetune.py
is to fine-tune the pretrained model.