COVID-Dialogue-Dataset-English is an English medical dialogue dataset about COVID-19 and other types of pneumonia. Patients who are concerned that they may be infected by COVID-19 or other pneumonia consult doctors and doctors provide advice. There are 603 consultations. Each consultation consists of

  • ID
  • URL
  • Description of patient’s medical condition
  • Dialogue

The dataset is built from icliniq.com, healthcaremagic.com, healthtap.com and all copyrights of the data belong to these websites.

COVID-Dialogue-Dataset-Chinese is a Chinese medical dialogue dataset about COVID-19 and other types of pneumonia. Patients who are concerned that they may be infected by COVID-19 or other pneumonia consult doctors and doctors provide advice. There are 1393 consultations. Each consultation consists of

  • ID
  • URL
  • Description of patient’s medical condition
  • Dialogue
  • (Optional) Diagnosis and suggestions.

The dataset is built from Haodf.com and all copyrights of the data belong to Haodf.com.

Details of the datasets are described in this report

If you find this dataset useful, please cite:

@article{ju2020CovidDialog,
  title={CovidDialog: Medical Dialogue Datasets about COVID-19},
  author={Ju, Zeqian and Chakravorty, Subrato and He, Xuehai and Chen, Shu and Yang, Xingyi and Xie, Pengtao},
  journal={ https://github.com/UCSD-AI4H/COVID-Dialogue}, 
  year={2020}
}

On the Chinese dataset, we developed a GPT2-based COVID-19 dialogue generation model. The details are provided in this preprint

If you find the code useful, please cite:

@article{zeng2020coviddialogmodel,
  title={Develop Medical Dialogue Systems for COVID-19},
  author={Zeng, Guangtao and Wu, Qingyang and Zhang, Yichen and Yu, Zhou and Xing, Eric and Xie, Pengtao},
  journal={ https://github.com/UCSD-AI4H/COVID-Dialogue}, 
  year={2020}
}