/e2e_dialog_challenge

End-To-End Task-Completion Dialogue Challenge

Primary LanguageOpenEdge ABL

SLT 2018 Special Session - Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems

News

  • 12/18/2018 – 12/21/2018: SLT Workshop
    - Dec. 18, 1:00PM - 2:00PM: Invited talks: 1hr, Speakers: Dilek Hakkani-Tur (Amazon) and Gokhan Tur (Uber)
    - Dec. 18, 2:00PM - 2:45PM: Oral presentations: 45mins
    - Dec. 18, 2:45PM - 4:15PM: Coffee/Poster/Demo session: 1.5hr
    - Dec. 18, 4:15PM - 5:00PM: Panel discussion: 45mins, Panelist: Alex Acero (Apple), Jianfeng Gao (Microsoft), Dilek Hakkani-Tur (Amazon) and Gokhan Tur (Uber)
  • 11/25/2018: Paper acceptance announcement.
  • 11/18/2018: Paper submission. Call for Papers.
  • 11/11/2018: Results (including human evaluation) Announcement.
  • 10/25/2018: System submission (https://msrprograms.cloudapp.net/MDC2018)
  • 08/03/2018: Movie domain is up, see cmd.md for instruction.
  • 07/28/2018: Restaurant and Taxi domains: Data and Simulators are up, see cmd.md for instruction.
  • 07/16/2018: Registration is now open.
  • 07/06/2018: Task description is up.

Task

This special session introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.

Please check this description for more details about the task.

Data

In this dialogue challenge, we will release well-annotated datasets for three task-completion domains: movie-ticket booking, restaurant reservation, and taxi ordering. Here shows the statistics of the three datasets.

Task Intents Slots Dialogues
Movie-Ticket Booking 11 29 2890
Restaurant Reservation 11 30 4103
Taxi Ordering 11 29 3094

Evaluation

As described in the task description (Section 4), we will evaluate the dialogue systems using both automatic and human evaluations on three criteria.

  • Success Rate: the fraction of dialogs that finish successfully.
  • Average Turns: the average length of the dialogue
  • Average Reward: the average reward received during the conversation. There is a strong correlation among the three metrics: generally speaking, a good policy should have a high success rate, high average reward and low average turns. Here, we choose success rate as our major evaluation metric.

We will also conduct human evaluation for the competition. We will ask human judges to interact with the final systems submitted by participants. Besides the measurements aforementioned, each user will also give a rating on a scale of 1 to 5 based on the naturalness, coherence, and task-completion capability of the system, at the end of each dialogue session.

Baseline Agents

  • A rule-based agent is provided.
  • A standard RL agent (DQN model) is provided.

System Submission Guidelines

Open an account in https://msrprograms.cloudapp.net/MDC2018 and create a submission with an abstract and code in the form of zip file(<100MB), trained agent model, and also NLU and NLG models if applicable. Include instructions for execution as below. Submission can be updated without limit no later than 10/14/2018 11:59 PM PST.

Instructions to run the sample submission in the SubmissionSample folder.

  1. Extract run.zip file (Zip the contents of system/src into run.zip)

  2. Run testrun.py to interact with the agent as below example.

    python testrun.py --agt 0 --usr 1 --max_turn 40 --kb_path ./run/deep_dialog/data_movie/movie.kb.1k.v1.p --goal_file_path ./run/deep_dialog/data_movie/user_goals_first.v2.p --slot_set ./run/deep_dialog/data_movie/slot_set.txt --act_set ./run/deep_dialog/data_movie/dia_acts.txt --dict_path ./run/deep_dialog/data_movie/slot_dict.v1.p --nlg_model_path ./run/deep_dialog/models/nlg/movie/lstm_tanh_[1533529279.91]87_99_199_0.988.p --nlu_model_path ./run/deep_dialog/models/nlu/movie/lstm[1533588045.3]_38_38_240_0.998.p --diaact_nl_pairs ./run/deep_dialog/data_movie/dia_act_nl_pairs.v7.json --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 0

Organizers

Reference

If you submit any system to this challenge or publish any other work making use of the resources provided on this project, we ask you to cite the following task description papers:

@article{li2018microsoft,
  title={Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems},
  author={Li, Xiujun and Panda, Sarah and Liu, Jingjing and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1807.11125},
  year={2018}
}

@article{li2016user,
  title={A User Simulator for Task-Completion Dialogues},
  author={Li, Xiujun and Lipton, Zachary C and Dhingra, Bhuwan and Li, Lihong and Gao, Jianfeng and Chen, Yun-Nung},
  journal={arXiv preprint arXiv:1612.05688},
  year={2016}
}

Contact

FQA

  1. How to implement an agent: see here