This repository contains the code and the processed dataset for cross lingual fact to long text generation. The paper describing the methods has been accepted at the European Conference on Artificial Intelligence (ECAI 2023).
The processed XLAlign
dataset is present in the XLAlign-Dataset
directory. The directory contains the subdirectories for each of the languages.
The code is present within the XFLT-code
directory and is organised as follows.
clustering
- This contains the code for training the fact organisation modelmT5-baseline
- End-to-end clusteringstatistical_clustering
- Statistical spectral clustering
dataset_prep
- This contains code for data preprocessingcoverage_classifier
- Code and data for training coverage prompt classifier
eval_module
- This contains the code for running evaluation using NLG metrics and the defined X-PARENT metrics.generation
- This contains code for training models using different methodsmT5-baseline
- Training baseline mT5 methodprompt_uni
- Training with coverage promptgrounded_decoding
- Inference with grounded decoding. Requires installing the modifiedtransformers
package included in the directory
rl_msme
- This contains code for training with RL rewards
The default hyperparameter settings can be found in the run
bash files in the respective directories.
The requirements for all methods in the generation
directory can be found in generation_reqs.txt
. The same for RL can be found in rl_reqs.txt
.