This source code correspond to our paper "Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach" (EMNLP 2022).
If you find this code or any of the ideas in this paper useful, please consider citing:
@inproceedings{chen2022towards,
title={Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach},
author={Miao Chen and Xinjiang Lu and Tong Xu and Yanyan Li and Jingbo Zhou and Dejing Dou and Hui Xiong},
booktitle={The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP' 22)},
year={2022}
}
We propose a table-to-text approach, named TASD, with the help of pretrained language model, table structure understanding, and text deliberation.
The framework overview is as follows:
For more details about our approach, please refer to this preprint version. We implement TASD with three deep learning framework: i.e., PyTorch, Tensorflow, and PaddlePaddle.
Run the following command to install the required dependencies (for a specific DL framework).
pip install -r {$path_to_certain_framework_folder}/requirements.txt
Unzip the data in the folder
cd data
unzip numericNLG.zip
unzip Totto.zip
To download the pretrained models and mofify the config.json in order to fit the TASD.
cd models
python paddle.py
python pytorch.py
For numericNLG:
cd code/paddle
python preprocess.py numeircNLG
sh pipeline.sh numericNLG 21 3 1e-6 gpt2-en 2 21 0
For Totto:
cd code/paddle
python preprocess.py numeircNLG
sh pipeline.sh numericNLG 21 3 1e-6 gpt2-en 2 21 0
For numericNLG:
cd code/pytorch
sh pipeline.sh 21 3 2 128 medium 1e-5 -1 4 0,1,2,3 4 numericNLG first
For Totto:
cd code/pytorch
sh pipeline.sh 21 3 2 128 medium 1e-5 -1 4 0,1,2,3 4 Totto first