WikiTableT

Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"

Note: we refer to the section data as hyperlink data in both the processed json files and the codebase.

Resources

Dependencies

Python 3.7
PyTorch 1.5.1
NLTK
py-rouge
entmax

Usage

Tp train a new model, you may use a command similar to scripts/train_large_copy_cyc.sh.

To perform beam search generation using a trained model, you may use a command similar to scripts/generate_beam_search.sh. The process should generate 4 files including references. 2 of them are tokenized using NLTK for the convenience of latter evaluation steps.

If you want to generate your own version of reference data when computing the PARENT scores, use a command similar to scripts/convert2parent_dev.sh.

Once you have the generated file, you may evaluate it against the reference using the command scripts/eval_dev.sh REF_FILE_PATH GEN_FILE_PATH. Please make sure that you are using the tokenized files.

Acknowledgement

Part of the code in this repository is adapted from the following repositories:

mingdachen/WikiTableT

WikiTableT

Resources

Dependencies

Usage

Acknowledgement