/arggen-candela

Code for our ACL19 paper on argument generation

Primary LanguagePythonMIT LicenseMIT

About

This repository contains code for our ACL19's paper Argument Generation with Retrieval, Planning, and Realization.

Note: Main modification to this repository includes using torch.utils.Dataset for data loading; tensorboard logging; and support for newer version of python and pytorch.

Usage

Requirement:

  • python 3.7
  • PyTorch 1.6.0
  • numpy 1.15
  • tensorboardX 2.1
  • tqdm

Data:

The dataset we used is currently held on Google drive, which can be accessed in this link.

Pre-trained weights:

As described in the paper, we pre-train the encoder and realization decoder with extra data from changemyview. The pre-trained weights can be downloaded here: encoder; decoder

To train

We assume the data to be loaded under ./data/ directory, and the pre-trained Glove embedding at ./embeddings/glove.6B.300d.txt. The following snippet trains the model:

python train.py \
    --exp-name=demo \
    --batch-size=16 \
    --max-epochs=30 \
    --save-freq=2 

Model checkpoints will be saved to ./checkpoints/[exp-name]/, and tensorboard logs will be saved to ./runs/[exp-name]/.

To decode

We implement greedy decoding for sentence planning (phrase selection and sentence type prediction), and beam search for word decoding. The following sample script run decoding based on the model checkpoint from demo, with epoch_id=30. Notice that by specifying --use-goldstandard-plan, the goldstandard sentence planning will be used (instead of greedy search). If option --quiet is not set, the intermediate logs will be printed to console.

python generate.py \
    --epoch-id=30 \
    --exp-name=demo \
    --max-token-per-sentence=30 \
    --beam-size=5 \
    --max-phrase-selection-time=2 \
    --block-ngram-repeat=4 \
    [--use-goldstandard-plan \]
    [--quiet]

Contact

Xinyu Hua (hua.x [at] northeastern.edu)

License

See the LICENSE file for details.