The official implementation for the paper "GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems" in ICLR 2022.
First of all, install following libraries and packages for conda environment:
conda env create -f environment.yml
conda activate gpt-critic
python -m spacy download en_core_web_sm
unzip data.zip
Training the GPT-Critic can be started by running main.py as follows:
python main.py -mode train -algorithm $ALGORITHM -cfg iteration=$ITERATION seed=$SEED
- To choose among running the GPT-Critic, UBAR, Decision Transformer, and Weighted BC you need to set the value of variable
$ALGORITHM
toGPT-Critic
,UBAR
,DT
, orWBC
respectively. - (Only for GPT-Critic) To choose the iteration, you need to change the value of variable
$ITERATION
to0
,1
,2
or3
respectively. - To choose the random seed, you need to change the value of variable
$SEED
to0
,1
, or2
respectively.
(Example)
python main.py -mode train -algorithm GPT-Critic -cfg iteration=3 seed=0
python main.py -mode test -algorithm $ALGORITHM -cfg iteration=$ITERATION seed=$SEED
- To choose among running the GPT-Critic, UBAR, Decision Transformer, and Weighted BC you need to set the value of variable
$ALGORITHM
toGPT-Critic
,UBAR
,DT
, orWBC
respectively. - (Only for GPT-Critic) To choose the iteration, you need to change the value of variable
$ITERATION
to0
,1
,2
or3
respectively. - To choose the random seed, you need to change the value of variable
$SEED
to0
,1
, or2
respectively.
(Example)
python main.py -mode test -algorithm GPT-Critic -cfg iteration=3 seed=0
If this repository helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:
@inproceedings{jang2022gptcritic,
title={{GPT}-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems},
author={Youngsoo Jang and Jongmin Lee and Kee-Eung Kim},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=qaxhBG1UUaS}
}
This code is adapted and modified upon the MultiWOZ and UBAR. We appreciate their released dataset and code which are very helpful to our research.