MIKe (SIGIR 2021 full paper)

The code for Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversations.

Please contact Chuan Meng (chuan.meng@outlook.com) if you have any question.

Reference

If you use any source code included in this repo in your work, please cite the following paper.

@inproceedings{meng2021initiative,
author = {Meng, Chuan and Ren, Pengjie and Chen, Zhumin and Ren, Zhaochun and Xi, Tengxiao and Rijke, Maarten de},
title = {Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversations},
year = {2021},
booktitle = {SIGIR},
pages = {522–532},
}

Requirements

python 3.6
pytorch 1.2.0-1.4.0
transformers 2.6

Datasets

We use Wizard of Wikipedia dataset and modified Holl-E dataset released by Kim et al. Both datasets have been processed into our defined format, which could be directly used by our model.

The datasets can be downloaded from here. After downloading, please create folder datasets in the root directory and put the files in it.

Running Codes

Our experiments were all conducted by one NVIDIA TITAN RTX GPU (24GB), and it's better to make sure your GPU memory size is 24GB. Otherwise you can reduce the batch size.

In order to save your time, we upload our pretrained checkpoints on the two datasets, and the checkpoints can be downloaded from here.

Please create folder output in the root directory and put the files in it.

Using pretrained checkpoints

To directly execute inference process on Wizard of Wikipedia dataset, please run:

python -u MIKe/Run.py --name MIKe_WoW --dataset wizard_of_wikipedia --mode inference

Wizard of Wikipedia (test seen)
{"F1": 19.7,
"BLEU-1": 18.62,
"BLEU-2": 8.13,
"BLEU-3": 4.5,
"BLEU-4": 2.75,
"ROUGE_1_F1": 26.02,
"ROUGE_2_F1": 7.33,
"ROUGE_L_F1": 19.13,
"METEOR": 18.07,
"ks_acc": 28.56}

Wizard of wikipedia (test unseen)
{"F1": 17.12, 
"BLEU-1": 16.92,
"BLEU-2": 6.31,
"BLEU-3": 3.16,
"BLEU-4": 1.91,
"ROUGE_1_F1": 23.63,
"ROUGE_2_F1": 5.49,
"ROUGE_L_F1": 17.25,
"METEOR: 15.64,
"ks_acc": 21.22}

Note: The inference process will create test_seen_result.json (test_unseen_result.json) and test_seen_xx.txt (test_unseen_xx.txt) in the directory output/MIKe_WoW, where the former records the model performance on automatic evaluation metrics, and the latter records the responses generated by the model.

To directly execute inference process on Holl-E dataset, please run:

python -u MIKe/Run.py --name MIKe_Holl_E --dataset holl_e --mode inference

Holl-E (single golden reference)
 {"F1": 32.11,
 "BLEU-1": 31.41,
 "BLEU-2": 24.06,
 "BLEU-3": 22.1,
 "BLEU-4": 21.17,
 "ROUGE_1_F1": 38.06,
 "ROUGE_2_F1": 25.17,
 "ROUGE_L_F1": 33.12, 
 "METEOR": 32.74, 
 "ks_acc": 32.71}

If you want to get the results in the setting of multiple golden references, please run:

python -u MIKe/CumulativeTrainer.py 

Holl-E (multiple golden references)
{"F1": 38.37,
 "BLEU-1": 40.98, 
 "BLEU-2": 31.85, 
 "BLEU-3": 29.21, 
 "BLEU-4": 28.10, 
 "ROUGE_1_F1": 44.08, 
 "ROUGE_2_F1": 31.48, 
 "ROUGE_L_F1": 38.99, 
 "METEOR": 38.81,  
 "ks_acc": 41.32}

Retraining

The related code will be released in a few days...