Continual Learning for Task-Oriented Dialogue Systems

Primary LanguagePython

Continual Learning for Task-Oriented Dialogue Systems

This repository includes the dataset and baselines of the paper:

Continual Learning for Task-Oriented Dialogue Systems (Accepted in EMNLP 2021) [PDF].

Authors: Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Zhiguang Wang, Pascale Fung


Continual learning in task-oriented dialogue systems allows the system to add new domains and functionalities over time after deployment, without incurring the high cost of retraining the whole system each time. In this paper, we propose a first-ever continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in both modularized and end-to-end learning settings. In addition, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. We also suggest that the upper bound performance of continual learning should be equivalent to multitask learning when data from all domain is available at once. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform better, by a large margin, compared to other continuous learning techniques, and only slightly worse than the multitask learning upper bound while being 20X faster in learning new domains. We also report several trade-offs in terms of parameter usage, memory size and training time, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released to promote more research in this direction.


The Continual Learning benchmark is created by jointly pre-processing four task-oriented dataset such as Task-Master (TM19), Task-Master 2020 (TM20), Schema Guided Dialogue (SGD) and MultiWoZ. To download the dataset, and setup basic package use:

pip install -r requirements.txt
cd data
bash download.sh

If you are interested in the pre-processing, please check utils/preprocess.py and utils/dataloader.py.

Basic Running

In this codebase, we implemented several baselines such as MULTI, VANILLA, L2, EWC, AGEM, LAMOL, REPLAY, ADAPTER, and four ToDs settings such as INTENT, DST, NLG, E2E. An example for running the NLG task with a VANILLA method is:

CUDA_VISIBLE_DEVICES=0 python train.py --CL VANILLA --task_type NLG

Different CL methods uses different hyperparamters. For example, in REPLAY you can tune the episodic memory size as following:

CUDA_VISIBLE_DEVICES=0 python train.py --CL REPLAY --task_type NLG --episodic_mem_size 10

this will randomly sample 10 example per task, and replay it while learning new once. A full example to run the baseline is for example:

CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL MULTI 
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL VANILLA --n_epochs 1 
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL L2 --reg 0.01 --n_epochs 1 
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL EWC --reg 0.01 --n_epochs 1
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL AGEM --episodic_mem_size 100 --reg 1.0 --n_epochs 1
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL LAMOL --percentage_LAM0L 200 --n_epochs 1
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL REPLAY --episodic_mem_size 100 --n_epochs 1
CUDA_VISIBLE_DEVICES=0 python train.py --task_type NLG --CL ADAPTER --bottleneck_size 75 --lr 6.25e-3 --n_epochs 10 --n_epochs 1



python train.py --task_type E2E --CL ADAPTER --bottleneck_size 300 --lr 6.25e-3 --n_epochs 10 --train_batch_size 10 --gradient_accumulation_steps 8
python train.py --task_type INTENT --CL ADAPTER --bottleneck_size 50 --lr 6.25e-3 --n_epochs 10 --train_batch_size 10 --gradient_accumulation_steps 8
python train.py --task_type NLG --CL ADAPTER --bottleneck_size 50 --lr 6.25e-3 --n_epochs 10 --train_batch_size 10 --gradient_accumulation_steps 8
python train.py --task_type DST --CL ADAPTER --bottleneck_size 100 --lr 6.25e-3 --n_epochs 10 --train_batch_size 10 --gradient_accumulation_steps 8


python train.py --task_type E2E --CL REPLAY --episodic_mem_size 50 --lr 6.25e-5 --n_epochs 10 --train_batch_size 8 --gradient_accumulation_steps 8
python train.py --task_type INTENT --CL REPLAY --episodic_mem_size 50 --lr 6.25e-5 --n_epochs 10 --train_batch_size 8 --gradient_accumulation_steps 8
python train.py --task_type NLG --CL REPLAY --episodic_mem_size 50 --lr 6.25e-5 --n_epochs 10 --train_batch_size 8 --gradient_accumulation_steps 8
python train.py --task_type DST --CL REPLAY --episodic_mem_size 50 --lr 6.25e-5 --n_epochs 10 --train_batch_size 8 --gradient_accumulation_steps 8


python scorer.py --model_checkpoint runs_INTENT/BEST/ --task_type INTENT
python scorer.py --model_checkpoint runs_DST/BEST/ --task_type DST
python scorer.py --model_checkpoint runs_NLG/BEST/ --task_type NLG
python scorer.py --model_checkpoint runs_E2E/BEST/ --task_type E2E


VANILLA 0.0303205 0.102345 10.3032 0.181644
L2 0.0346528 0.0923626 11.0159 0.189819
EWC 0.0283001 0.0998913 9.65351 0.203158
AGEM 0.102224 0.0965043 4.61297 0.360167
LAML 0.0262127 0.0923302 3.49649 0.35664
REPLAY 0.800088 0.394993 21.4832 0.0559855
ADAPTER 0.841951 0.37381 21.7719 0.163975
MULTI 0.875002 0.500357 26.1462 0.0341823


VANILLA 0.0264631 0.0986375 6.45 0.499676
L2 0.0239718 0.069225 6.02459 0.553715
EWC 0.025299 0.101422 4.72 0.572742
AGEM 0.303349 0.109677 4.66216 0.651552
LAML 0.0269017 0.0939656 3.55622 0.638889
REPLAY 0.785325 0.297534 16.2668 0.190309
ADAPTER 0.906857 0.35059 16.5768 0.331949
MULTI 0.954546 0.488995 23.6073 0.12558


I would like to thanks Saujas Vaduguru, Qi Zhu, and Maziar Sargordi for helping with debugging the code.