Out-of-Distribution Generalization Challenge in Dialog State Tracking

This repository contains official implementation of the paper (Out-of-Distribution Generalization Challenge in Dialog State Tracking)[https://openreview.net/forum?id=Z-k91NB8Eh].

Abstract

Dialog State Tracking (DST) is a core component for multi-turn Task-Oriented Dialog (TOD) systems to understand the dialogs. DST models need to generalize to Out-of-Distribution (OOD) utterances due to the open environments dialog systems face. Unfortunately, utterances in TOD are multi-labeled, and most of them appear in specific contexts (i.e., the dialog histories). Both characteristics make them different from the conventional focus of OOD generalization research and remain unexplored. In this paper, we formally define OOD utterances in TOD and evaluate the generalizability of existing competitive DST models on the OOD utterances. Our experimental result shows that the performance of all models drops considerably in dialogs with OOD utterances, indicating an OOD generalization challenge in DST.

Installation

This code is written with python 3.8.3. For packages required to run this code, please refer to requirements.txt.

Preparation

Please download the MultiWOZ 2.3 data set from their official repository and place the downloaded under the datasets folder.

As expected, you should have the following directory structure.

DST_OOD
├── datasets
│   ├── MultiWOZ2_3
│   │   ├── data.json
│   │   ├── dialogue_acts.json
│   │   └── ontology.json
│   └── ... 
└── ...

Then, prepare the training data and OOD test data by running

cd datasets
python DataInit.py

This will create MultiWOZ_OoD under the datasets folder. Besides training and validation data copied from the original datasets, MultiWOZ_OoD contains test data of different types both from the original test set and by generation (i.e., the MultiWOZ OOD test set described in our paper).

The script also preprocesses the data for three different DST methods (i.e., SimpleTOD, Trippy and TRADE) and saves the preprocessed data in MultiWOZ_OoD_${method}.

Training

To train the models, run

python experiments.py --method SimpleTOD --devices 0 --pretrained gpt2 --train

python experiments.py --method Trippy --devices 0 --pretrained bert-base-uncased --train

python experiments.py --method Trade --devices 0 --pretrained none --train

You can accelerate training with data distributed parallel training by assigning multiple devices. For example,

python experiments.py --method SimpleTOD --devices 0 1 2 3 --pretrained gpt2 --train

By default, the model checkpoints are under ${method}/checkpoints/.

We also share our trained checkpoints at xxx(TBD).

Testing

To test the models, run

python experiments.py --method SimpleTOD --devices 0 --pretrained gpt2 --ood --checkpoint SimpleTOD/checkpoint/${ckpt_folder}

python experiments.py --method Trippy --devices 0 --pretrained bert-base-uncased --ood --checkpoint Trippy/checkpoint/${ckpt_folder}

python experiments.py --method Trade --devices 0 --pretrained none --ood --checkpoint Trade/checkpoint/${ckpt_folder}

License

The code is released under BSD 3-Clause - see LICENSE for details.

This code includes other open source codes from SimpleTOD, Trippy and Trade. These components have their own liscences. Please refer to their official repositories.