Querysum

Code for the model presented in the thesis Query-Based Abstractive Summarization Using Neural Networks by Johan Hasselqvist and Niklas Helmertz.

Requirements

Python 2.7 or 3.5
Tensorflow 1.1.0

Data

Instructions for acquiring the dataset released along with this model can be found at a separate repo.

Pre-trained embeddings can be downloaded at https://nlp.stanford.edu/projects/glove. For the thesis work, 100-dimensional embeddings trained on "Wikipedia 2014 + Gigaword 5" have been used.

Training

Replacing the parts in angle brackets, the model can be trained by running:

python querysum.py \
    <path to embeddings file> \
    <path to directory containing summary_vocabulary.txt and document_vocabulary.txt> \
    --mode train \
    --logdir <path to to where model data is saved> \
    --training_dir <path to training set root directory> \
    --validation_dir <path to validation set root directory> \
    --batch_size <the batch size, 30 by default>

Progress can be monitored using tensorboard by running:

tensorboard --logdir <path to logdir>

Generating summaries

From a trained model, summaries can be generated by running:

python querysum.py \
    <path to embeddings file> \
    <path to directory containing summary_vocabulary.txt and document_vocabulary.txt> \
    --mode decode \
    --logdir <path to logdir from a training session> \
    --decode_dir <path to dataset directory, containing documents and queries, to generate summaries for> \
    --decode_out_dir <path to directory where generated summaries are saved>

helmertz/querysum

Querysum

Requirements

Data

Training

Generating summaries