This is an effort to compile a repository of the clinical characteristics of patients who have taken a COVID-19 test. By sharing our schema and data, we hope that we can 1) accelerate information sharing among frontline healthcare providers and 2) facilitate studies on COVID-19 signs, symptoms, stages, and care plans.
The repository is maintained as CSV files and is compliant with HIPAA Privacy Rule's De-Identification Standard. Further, a patient's reported age differs from their actual age by a reasonable randomized amount to protect their privacy.
The data includes clinical characteristics (epidemiologic factors, comorbidities, vitals, clinician-assessed symptoms, patient-reported symptoms), in addition to radiological and laboratory findings. It does not include treatment plans, complications, and clinical outcomes, which is collected at inpatient facilities. Details about each field are available in the data dictionary.
The data includes both positive and negative test results for symptomatic and asymptomatic patients. The data does not include results for patients with severe symptoms. We refer such patients to ER.
It is important to note that our data collection is clinically-driven and therefore not systematic. This means that overall positive rates are descriptive of the Carbon Health patient population and cannot be generalized to the unobserved population. We provide functions to identify symptom severity to aid in accounting for the various admission criteria that affect positive rates.
- Each batch, stored as a CSV file, contains a week's worth of results from Carbon Health and Braid Health.
- The first file, prefixed with 04-07, contains a month's worth of results starting from 03-07.
- Each filename is prefixed with the most recent day (mm-dd) included in the batch, matching
batch_date
. - Each row contains the clinical characteristics of a patient who has taken a COVID-19 test.
- Motivations, details, and next steps are available in this discussion from May 2020.
- Data preparation and summarization functions are available in these two notebooks. For example, below is a visualization of the fill rate for all variables within the data repository.
Please note that the fill rates of vitals and clinician-assessed symptoms have dropped with the more recently published data batches due to the presence of a greater volume of mobile clinics, pop-up clinics, and home test kits. Clinicians are not present at these locations during the time of specimen collection, and therefore, vitals are not taken and exams are not provided.
- Website: Carbon Health
- Twitter: @CarbonHealth
- Email: covidclinicaldata@carbonhealth.com
- Notes:
- Carbon Health began COVID-19 testing with the SARS-CoV-2 RNA RT-PCR test on 03-04-20.
- The data includes the clinical characteristics (epi factors, comorbidities, vitals, clinician-assessed symptoms, patient-reported symptoms) and laboratory results of patients on the date of service.
- Acknowledgements:
- Data Science Team: Nosheen Moosvi, Rebekkah Ismakov, Pardis Noorzad
- Clinical Team: Greg Burrell, Haritha Atluri, Roger Wu, Caesar Djavaherian, Sujal Mandavia
Carbon Health Logo | Data Dictionary |
- Website: Braid Health
- Twitter: @BraidHealth
- Email: vivian@braid.health and k@braid.health
- Notes:
- Braid Health data is joined with Carbon Health data using the MRN and encounter ID fields, which are subsequently removed.
- The radiological data includes findings, clinician impressions, labels, and links to chest x-rays on the Braid Health website.
- The website UI allows for closer inspection by researchers and radiologists.
- The images can be downloaded for image processing and classification studies.
- Acknowledgements:
- Data Engineering: Kevin Quennesson, Daniel Hasegan, Üstün Özgür
- Product Design: Alessandro Sabatelli
- Clinical: Rajni Natesan, Vivian Liu
Braid Health Logo | Sample Chest X-ray |
Special thanks to Eren Bali, Kevin Quennesson, Nigam Shah, Andrew Therriault, Omer Koren, and Andrew Pikul for their support of this effort and for their feedback.
To ensure this data is representative of cases with varying severity levels and symptoms, we are requesting data from outpatient test centers and inpatient healthcare facilities which are treating COVID-19. Please use the data dictionary to prepare the data. Please send data and inquiries to covidclinicaldata@carbonhealth.com.
Please share any studies on this data via email or a pull request.
Please use the format below to cite the data repository in your studies.
@dataset{2020covidclinicaldata,
author = {Carbon Health and Braid Health},
title = {Coronavirus Disease 2019 (COVID-19) Clinical Data Repository},
howpublished = {Accessed from \url{https://covidclinicaldata.org/.}},
year = 2020,
version = {10-20-2020}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.