/dynamic-transformer-ensembles

Dynamic ensemble decoding with transformer-based models

Primary LanguageJupyter NotebookMIT LicenseMIT

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

This repo contains the code for DynE: Dynamic Ensemble Decoding for Multi-Document Summarization.

This code base can be used to add dynamic ensembling capability to models from the Huggingface transformers library.

Setup / Installation

# make a fresh environment
conda create -n dynamic-ensembles python=3.6
conda activate dynamic-ensembles

# Installation 
make dev

Multi-Document Summarization (MDS) Datasets

MDS datasets in the format required by the scripts in this repo:

The original WCEP dataset used to generate the flat training data:


Model Checkpoints and Outputs

Model Checkpoints

We fine-tune the bart-large-cnn single-document summarization model from the transformers library

  • The best fine-tuned model checkpoints for WCEP and MultiNews are here
Fine-tuned Model Outputs
  • Download the outputs of fine-tuned models on the test sets of WCEP and MultiNews here

Evaluation

Prediction and evaluation are done by the script transformer_decoding/evaluate.py There is also a make task for evaluation which simply calls this script.

For example, to predict using a model id from transformers, or with a fine-tuned model checkpoint, and evaluate with the Ghalandari et al. 2020 evaluation workflow:

MODEL_ID=model_checkpoints/wcep_fine-tune-bart-large/checkpointepoch\=1.ckpt \
RUN_FLAGS='--max-articles-in-cluster 5 --max-src-length 512 --max-tgt-length 64 --num-beams 5 --eval-prefix wcep_5_articles_' \
make evaluate
  • pretrained model checkpoints can be downloaded from the links above.

For a quick test, use the --rows-to-eval argument, which will only predict the first N rows from the dataset:

MODEL_ID=model_checkpoints/wcep_fine-tune-bart-large/checkpointepoch\=1.ckpt \
RUN_FLAGS='--max-articles-in-cluster 5 --max-src-length 512 --max-tgt-length 64 --num-beams 5 --rows-to-eval 10 --eval-prefix wcep_5_articles_' \
make evaluate

To run evaluation only, using previously generated predictions, supply the --predictions argument to transformer_decoding/evaluate.py:

EVALUATION_DATASET=data/WCEP/test.jsonl \
RUN_FLAGS='--predictions outputs/wcep/wcep_5_articles_eval_predicted_summaries.out' \
make evaluate
Scoring Gold Summaries by Forced Decoding

EVALUATION_DATASET=data/WCEP/test.jsonl \
RUN_FLAGS='--force-decode-gold --max-articles-in-cluster 5 --max-src-length 512 --max-tgt-length 512 --num-beams 1 --rows-to-eval 10 --eval-prefix wcep_5_articles_' \
make evaluate


Citing

If you use ideas or code from this project, please cite:

@article{DynamicEnsembles,
    title = {DynE: Dynamic Ensemble Decoding for Multi-Document Summarization},
    author = {Chris Hokamp and Demian Gholipour Ghalandari and Nghia The Pham
              and John Glover},
    journal={arXiv preprint arXiv:2006.08748},
    year = {2020},
}