This repo contains code for the paper Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling
Authors: Ilia Kulikov, Maksim Eremeev, Kyunghyun Cho
See the reqs.txt
In order to utilize the scripts provided, one should set the following environment variables:
export OVERLAY_PATH= # only for SLURM + singularity
export CUDA_SIF_PATH= # only for SLURM + singularity
# used to construct fairseq --save-dir
export EXPERIMENTS_DIRECTORY_WMT=
export EXPERIMENTS_DIRECTORY_IWSLT=
# used to construct fairseq data argument, we have used preprocessed data from fairseq-preprocess
export DATA_IWSTL=
export DATA_WMT19_DEEN=
export DATA_WMT19_RUEN=
export DATA_WMT19_ENDE=
export DATA_WMT16_ENDE=
# used to construct fairseq --finetune-from-model
export PRETRAINED_MODEL_WMT19_DEEN=
export PRETRAINED_MODEL_WMT19_RUEN=
export PRETRAINED_MODEL_WMT19_ENDE=
export PRETRAINED_MODEL_WMT16_ENDE=
# fairseq user package
export FAIRSEQ_MODULE=
# this is only used for results post-processing
export RESULTS_DIRECTORY=
Please use the fairseq with the following commit to avoid issues with updated codebase:
git clone https://github.com/pytorch/fairseq.git
git checkout c6006678261bf5d52e2c744508b5ddd306cafebd
Install fairseq:
cd fairseq
pip install --editable ./
We use SLURM manager to run all experiments in this work. We use Nvidia RTX8000 gpus to train and validate models. We provide slurm scripts which use singularity containers, each of them has such block:
python ${PYTHON_SWEEP_PATH} --call_fn ${EXPERIMENT_NAME} --sweep_step ${SLURM_ARRAY_TASK_ID} --language ${LANGUAGE} | xargs python ${FAIRSEQ_MODULE}/train.py
Where the first python command calls the slurm sweep factory which generates command line arguments. These arguments are then fed into the fairseq process. If you do not use slurm or singularity, then you can directly use these commands.
WMT16 En-De: https://github.com/pytorch/fairseq/tree/main/examples/scaling_nmt#pre-trained-models
WMT19 Models (single models before fine-tuning): https://github.com/pytorch/fairseq/tree/main/examples/wmt19#pre-trained-single-models-before-finetuning
Note that dict files and bpe codes from WMt19 models are used as part of data preparation scripts later.
We tested the proposed approach on the following datasets: IWSLT'17 {DE, FR, ZH}-EN, WMT'19 {RU, DE}-EN, WMT'19 EN-DE, WMT'16 EN-DE.
You can find bash scripts for data preprocessing in the /data
directory.
At the time of publication of this work the google drive link to WMT16 En-De preprocessed data from Google became invalid.
Please note: the BPE codes for WMT tasks come with the pretrained checkpoints. Please download checkpoints first.
fairseq preprocessing: we use preprocessing command from fairseq which takes in the dictionary files in case of WMT tasks where pretrained checkpoints come with corresponding dictionaries.
Model configurations we used during the experiments can be found in the /sweep_configs
directory. Each file contains experiment setups for a specific dataset. In order to train the model use
python sweep_configs/CONFIG_SCRIPT --call_fn EXPERIMENT_NAME --sweep_step CONFIG_SERIAL_NUMBER (--language SOURCE_LANGUAGE) | xargs python fairseq_module/train.py
You can find more examples in the /slurm_files
directory. Note that --language
parameter only applies to IWSLT experiments. CONFIG_SERIAL_NUMBER
defines the step which maps to a specific configuration to run.
We prepared custom validation script to compute and store all nessessary statistics in a pickle file. In the post-processing step we parse these files to compuite average statistics across training runs. To run the validation, execute the following command:
python sweep_configs/CONFIG_SCRIPT --call_fn validate_trained_sweep_ontest --experiment_name_to_validate EXPERIMENT_NAME --sweep_step CONFIG_SERIAL_NUMBER --beam BEAM_SIZE (--language SOURCE_LANGUAGE) | xargs fairseq_module/validate.py
You can find more examples in the /slurm_files
directory. Note that --language
parameter only applies to IWSLT experiments.
/parsers
directory contains several tools that automatically process the validation pickles.
The following command executes the post-processing:
python parsers/process_extra_state_pickles.py -p EXPERIMENT_RESULTS_PATH -b BEAM_SIZE -o OUTPUT_DIRECTORY
You can find more examples by looking at /parsers/submit_jobs*
slurm files..
Run /experiments.ipynb
notebook to render the plots that are used in the paper.
If you want to cite our work, please use the following bib:
@misc{kulikov2021characterizing,
title={Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling},
author={Ilia Kulikov and Maksim Eremeev and Kyunghyun Cho},
year={2021},
eprint={2112.08914},
archivePrefix={arXiv},
primaryClass={cs.LG}
}