Extractive Summarization with Discourse Tree Attention

This is the official code for the paper 'Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !' (CODI at EMNLP 2020)

Prepare

You need

Python3
Pytorch
Pandas
Numpy
rouge_papier_v2 can be found here

Data and Trained Models

The CNNDM dataset with generated attention maps (C-tree w/ Nuc) can be found here. It is based on the dataset from DiscoBERT with segmented EDUs.

The trained model with discourse tree attention can be found here

Discourse Parser

We use the state-of-the-art Discourse Parser

How to train the model

Run

python main.py

with following arguments:

-bert_dir indicates where to store the pretrained BERT model
-d_v, -d_k, -d_inner, -d_mlp, -n_layers, -n_head, -dropout are the parameters of the Transformer-based Document Encoder model
-lr, -warmup_steps are the parameter for the adam optimizer
-inputs_dir, -val_inputs_dir are the address of the data
-unit, -unit_length_limit, -word_length_limit indicates whether you want to use sentence or edu as the basic unit, and the length limits of generated summaries
-batch_size indicates the number of instances per batch
-attention_type choose from 'tree', 'dense', 'fixed_rand', 'learned_rand', 'none' and 'self-attention'
-device indicates which gpu device you want to use

How to evaluate the model

Run

python test.py