This is the official code for the paper 'Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !' (CODI at EMNLP 2020)
You need
- Python3
- Pytorch
- Pandas
- Numpy
- rouge_papier_v2 can be found here
The CNNDM dataset with generated attention maps (C-tree w/ Nuc) can be found here. It is based on the dataset from DiscoBERT with segmented EDUs.
The trained model with discourse tree attention can be found here
We use the state-of-the-art Discourse Parser
Run
python main.py
with following arguments:
- -bert_dir indicates where to store the pretrained BERT model
- -d_v, -d_k, -d_inner, -d_mlp, -n_layers, -n_head, -dropout are the parameters of the Transformer-based Document Encoder model
- -lr, -warmup_steps are the parameter for the adam optimizer
- -inputs_dir, -val_inputs_dir are the address of the data
- -unit, -unit_length_limit, -word_length_limit indicates whether you want to use sentence or edu as the basic unit, and the length limits of generated summaries
- -batch_size indicates the number of instances per batch
- -attention_type choose from 'tree', 'dense', 'fixed_rand', 'learned_rand', 'none' and 'self-attention'
- -device indicates which gpu device you want to use
Run
python test.py
with following arguments:
- -model_path, -model_name indicates the folder and name of the saved model, the model to evalueate is 'model_path/model_name'
- -test_inputs_dir indicates the address of test data
- -device, indicates which gpu device you want to use
- -d_v, -d_k, -d_inner, -d_mlp, -n_layers, -n_head, -dropout are the parameters of the Transformer-based Document Encoder model
- -unit, -unit_length_limit, -word_length_limit indicates whether you want to use sentence or edu as the basic unit, and the length limits of generated summaries
- -attention_type choose from 'tree', 'dense', 'fixed_rand', 'learned_rand', 'none' and 'self-attention'