This repository contains an implementation based on the models described in Data-to-Text Generation with Content Selection and Planning. This project also contains a neural content planner to produce content plans, however by default this implementation uses template-based content plans, which improve the generated texts.
Clone the repository with the following command:
git clone --recurse-submodules https://github.com/potamides/Data2Text
For a quick start download the preprocessed dataset and the pretrained models and drop their contents into the root folder of this repository. This will save you from preprocessing and training, which requires a lot of time.
Make sure that pipenv is installed:
pip install pipenv
Then cd into the repository and create a virtual environment with the required dependencies:
cd $PATH_TO_REPOSITORY
pipenv install
After that start a shell within the created virtual environment:
pipenv shell
Generate the game summaries with the following command:
# could also be 'train' or 'test'
CORPUS=valid
./data2text.py generate --corpus $CORPUS
# if you want to use content plans created by the content-planner, use this command:
./data2text.py generate --corpus $CORPUS --use-planner
The generated texts will be saved as markdown files in the generations folder. Every markdown file contains the generated summary, the gold summary, the associated records, information on which values where copied, the content plan and the metrics.
If you want to compare the texts according to their metrics, you can use the sort_by.sh script:
# could also be 'co_distance', 'rg_precision', 'rg_number', 'cs_precision' or 'cs_recall'
METRIC=bleu
./sort_by.sh $METRIC
Every step in the model pipeline can be evaluated with the following command:
# could also be 'extractor' or 'planner'
STAGE=generator
# could also be test
CORPUS=valid
./data2text.py evaluate --stage $STAGE --corpus $CORPUS
# if you want to use content plans created by the content-planner, use this command:
./data2text.py evaluate --stage $STAGE --corpus $CORPUS --use-planner
If you want to train the models yourself, you can do so with the following command:
# could also be 'extractor', 'planner' or 'generator'
STAGE=pipeline
./data2text.py train --stage $STAGE
For advanced usage check out the help argument:
./data2text.py --help