This repo contains an implementation of the pointer-generator network from See et al (2017) in PyTorch wrapped in the AllenNLP framework. You need to use allennlp version 0.8.5
We are releasing the pretrained model parameters from See et al converted into PyTorch's format. You can download the pretrained parameter file from here
To run the pretrained model without coverage, use the command
allennlp predict pretrained_model_skeleton.tar.gz --weights-file see_etal_weights_without_coverage.th --include-package pointergen --cuda-device <gpu-id> --predictor beamsearch <test-file>
where weights-file
is the PyTorch pretrained weights file downloaded from the link above. A sample
test-file
has been provided to illustrate the input jsonl format.
For the model with coverage, use
allennlp predict pretrained_model_skeleton_withcoverage.tar.gz --weights-file see_etal_weights_with_coverage.th --include-package pointergen --cuda-device <gpu-id> --predictor beamsearch <test-file>
The configuration for training a model is given in the sample_experiment.jsonnet
file. You have to provide paths
to the training and validation jsonl files in it. You can also customize the model size
and optimizer parameters via this file. To train a model, run the command
allennlp train sample_experiment.jsonnet --include-package pointergen -s <serialization-directory>
The above command will train the model without the coverage loss. To further train the model with coverage loss, follow the next steps.
Suppose the serialization directory of the model trained without coverage is precoverage
. Then run
allennlp train precoverage/config.json --include-package pointergen -s postcoverage --overrides '{"model":{"type":"pointer_generator_withcoverage"}, "trainer":{"num_epochs":1}, "vocabulary": {"extend": true, "directory_path": "precoverage/vocabulary", "max_vocab_size":1}}'
We will discuss why this command looks so ugly later. After running this command, you would be asked if you want to load weigths from a non-coverage model. Enter yes and in the next prompt enter the path to the weights (precoverage/best.th
in this case)
To make predictions on a test file, the usual command is used.
allennlp predict postcoverage/model.tar.gz --include-package pointergen --cuda-device 0 --predictor beamsearch sample_datafile.jsonl
You would see the same prompt, just reply no to it and the program would continue as usual.
We prefer to override the settings of the precoverage config.json file rather than creating a new one so that it is ensured that the other parameters of the experiment remain the same (eg. hidden size of LSTM layers). The overrides change the model type to pointer_generator_coverage
and the number of epochs can be set as required for finetuning. Since we will be loading pretrained weights, we must make sure that the vocabulary of the new model is same as the one that the old one had. This might easily not hold true if for example, you finetune on a slightly different training dataset or a subset of it. So to enforce it we give the path of the vocabulary of the old precoverage model. Ideally the extend
flag should have been turned to false to disallow further additions to the vocabulary. But then allennlp complains that it was passed an extra parameter max_vocab_size
which was picked up from the precoverage/config.json
file. So we set it extend
to true but pass a very small value of max_vocab_size
through the override which is certainly even smaller than the loaded vocabulary and hence further additions do not happen to it. Why didn't we set max_vocab_size
to 0? Because it is interpreted by allennlp as a directive to not place any bounds on the vocabulary size. So we set it to 1 which still has the effect that we want.