DC3 is a collection of 31 extremely difficult diagnostic case challenges, that were manually compiled and solved by clinical experts. For each case, there are a number of temporally ordered physician-generated observations alongside the eventually confirmed true diagnosis. We additionally provide inferred dense relevance judgments for these cases in the PubMed Collection of scholarly biomedical articles.
The dataset is described in detail in our ICTIR 2019 paper.
- python > 2.7
- requests
- json
- datetime
- bs4
For copyright reasons we cannot directly share the collection and instead provide a Python script that scrapes the collection for you. Running the following command:
python download.py
will generate the dc3.json
file containing all 31 case related topics.
To evaluate your diagnostic decision support system, qrels.txt
contains inferred dense relevance judgments for the 2018 snapshot of the National Library of Medicine's PubMed database in trec_eval format.
If you want to refer to DC3, please cite:
@INPROCEEDINGS{eickhoff2019diagnostic,
title={{DC$^3$ -- A Diagnostic Case Challenge Collection}},
author={Eickhoff, Carsten and Gmehlin, Floran and Patel, Anu and Boullier, Jocelyn and Fraser, Hamish},
booktitle={{Proceedings of the 5th ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR)}},
year={2019},
organization={ACM}
}