BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
Official Pytorch implementation of our EMNLP paper:
Minju Kim*, Chaehyeong Kim*, Yongho Song*, Seung-won Hwang and Jinyoung Yeo. BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets. EMNLP, 2022 [Paper] (* equal contribution)
If you use the materials in this repository as part of any published research, we ask you to cite the following paper:
@inproceedings{Kim2022botstalk,
title={BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets},
author={Kim, Minju and Kim, Chaehyeong and Song, Yongho, Seung-won Hwang and Yeo, Jinyoung},
booktitle={EMNLP},
year=2022
}
You can download the paper version of our BSBT dataset here.
python scripts/self_mix.py \
--subtasks convai2,wizard_of_wikipedia,empatheticdialogues \
--num-self-mixs 5 \
--selfmix-max-turns 6 \
--datatype train \
--expert-model-files zoo:dodecadialogue/convai2_ft/model,zoo:dodecadialogue/wizard_of_wikipedia_ft/model,zoo:dodecadialogue/empathetic_dialogues_ft/model \
--expert-model-opt-files opt_files/conv.opt,opt_files/wow.opt,opt_files/ed.opt \
--display-examples True \
--task convai2 --seed_messages_from_task 1 \
--model-file zoo:dodecadialogue/convai2_ft/model \
--skip-generation False --inference nucleus \
--beam-size 3 \
--beam-min-length 10 --beam-block-ngram 3 --beam-context-block-ngram 3 \
--save-format parlai \
--ranker-model-files zoo:pretrained_transformers/model_poly/model,/your_path/empathetic_dialogues_poly/model.checkpoint,/your_path/wizard_of_wikipedia_poly/model.checkpoint \
--outfile your_path/output/test_files.txt
Please contact Minju Kim at minnju@yonsei.ac.kr.
This repository is MIT licensed. See the LICENSE file for details.