Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"
Note: we refer to the section data as hyperlink data in both the processed json files and the codebase.
- WikiTableT dataset
- multi-bleu and METEOR score
- Trained models (base+copy+cyc (trained on 500k instances) and large+copy+cyc (trained on the full dataset))
- BPE code and vocab (We used https://github.com/rsennrich/subword-nmt)
- Data for computing the PARENT scores
Tp train a new model, you may use a command similar to scripts/train_large_copy_cyc.sh
.
To perform beam search generation using a trained model, you may use a command similar to scripts/generate_beam_search.sh
. The process should generate 4 files including references. 2 of them are tokenized using NLTK for the convenience of latter evaluation steps.
If you want to generate your own version of reference data when computing the PARENT scores, use a command similar to scripts/convert2parent_dev.sh
.
Once you have the generated file, you may evaluate it against the reference using the command scripts/eval_dev.sh REF_FILE_PATH GEN_FILE_PATH
. Please make sure that you are using the tokenized files.
Part of the code in this repository is adapted from the following repositories: