Python version: This code is in Python3.6
Package Requirements: torch==1.5.0 pytorch_transformers tensorboardX multiprocess pyrouge
Some codes are borrowed from ONMT(https://github.com/OpenNMT/OpenNMT-py)
CNN/DM DistilExt (8-layer Transformer)
XSum DistilExt (6-layer Transformer)
For the steps of data preprocessing, please visit PreSumm for more information.
We provide our pre-processed data here.
First run: For the first time, you should use single-GPU, so the code can download the BERT model. Use -visible_gpus -1
, after downloading, you could kill the process and rerun the code with multi-GPUs.
The scripts below are in the folder src.
bash cnndm_train.sh
bash xsum_train.sh
bash xsum_train_stu.sh
# this shell script will validate all the saved model steps during training
bash cnndm_val.sh
# this shell script will validate all the saved model steps during training
bash xsum_val.sh
bash cnndm_test_single.sh
# test teacher
bash xsum_test_teacher.sh
# test student
bash xsum_test_single.sh
-mode
can be {validate, test
}, wherevalidate
will inspect the model directory and evaluate the model for each newly saved checkpoint,test
need to be used with-test_from
, indicating the checkpoint you want to useMODEL_PATH
is the directory of saved checkpoints- use
-mode valiadte
with-test_all
, the system will load all saved checkpoints and select the top ones to generate summaries (this will take a while)