Code and released pre-trained model for our ACL 2022 paper: DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation.
- Fixed bugs in dailydialog, updated new training and evaluation scripts. (2022.06.19)
- Optimize code structure and remove redundant code. (2022.05.29)
- Pretrained checkpoints of DialogVED have been released! (2022.05.17)
- A fp16 version of DialogVED will be released, about 700M in size.
- Pre-trained scripts are scheduled to be released.
- python==3.7
- torch==1.3.0
- fairseq==0.9.0
- tensorboardX==1.7
- pytorch_transformers
- sklearn
- nltk==3.5
sudo apt install default-jdk
curl https://install.meteor.com/ | sh
pip install -r requirements.txt
We have released the following checkpoints for pre-trained models as described in the paper of DialogVED. Download the pre-trained checkpoint and set the load-from-pretrained-model
parameter in the fine-tuning running command.
Note: DialogVED-VAE-Standard has a size of latent size 32, where DialogVED-VAE-Large has a size of latent size 64. DialogVED-Seq2Seq has no latent variable, it's a pure seq2seq model with the same training setting like DialogVED. It may perform better in scenarios where diversity of responses is less important.
We finetune DialogVED on three datasets DailyDialog, PersonaChat and DSTC7AVSD. You can download them according to the instructions in PLATO, or run our script as follows.
bash preprocess/get_data.sh
bash preprocess/process.sh
bash preprocess/binarize.sh
the script train.sh
has three parameters, namely p
, t
and d
.
p
: pretrained model patht
: pretrained model type (dialogved_standard
,dialogved_large
ordialogved_seq2seq
)d
: fine-tuned dataset (dailydialog
,personachat
ordstc7avsd
)
bash train.sh -p /remote-home/models/dialogved_standard.pt -t dialogved_standard -d dailydialog
the script infer.sh
has two parameters, namely d
and s
.
d
: fine-tuned dataset (dailydialog
,personachat
ordstc7avsd
)s
: decoding strategy (greedy
,beam
orsampling
)
bash infer.sh -d dailydialog -s beam
the script eval.sh
has one parameter, namely d
.
d
: fine-tuned dataset (dailydialog
,personachat
ordstc7avsd
)
bash eval.sh -d dailydialog
If you extend or use this work, please cite the paper where it was introduced:
@inproceedings{chen-etal-2022-dialogved,
title = "{DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation",
author = "Chen, Wei and Gong, Yeyun and Wang, Song and Yao, Bolun and Qi, Weizhen and Wei, Zhongyu and Hu, Xiaowu and Zhou, Bartuer and Mao, Yi and Chen, Weizhu and Cheng, Biao and Duan, Nan",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.333",
doi = "10.18653/v1/2022.acl-long.333",
pages = "4852--4864",
abstract = "Dialog response generation in open domain is an important research topic where the main challenge is to generate relevant and diverse responses. In this paper, we propose a new dialog pre-training framework called DialogVED, which introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses. With the help of a large dialog corpus (Reddit), we pre-train the model using the following 4 tasks, used in training language models (LMs) and Variational Autoencoders (VAEs) literature: 1) masked language model; 2) response generation; 3) bag-of-words prediction; and 4) KL divergence reduction. We also add additional parameters to model the turn structure in dialogs to improve the performance of the pre-trained model. We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation. Experimental results show that our model achieves the new state-of-the-art results on all these datasets.",
}