/covidclinicaldata

Coronavirus Disease 2019 (COVID-19) Clinical Data Repository

Primary LanguageJupyter Notebook

Coronavirus Disease 2019 (COVID-19) Clinical Data Repository

This is an effort to compile a repository of the clinical characteristics of patients who have taken a COVID-19 test. By sharing our schema and data, we hope that we can 1) accelerate information sharing among frontline healthcare providers and 2) facilitate studies on COVID-19 signs, symptoms, stages, and care plans.

The Repository

The repository is maintained as CSV files and is compliant with HIPAA Privacy Rule's De-Identification Standard. Further, a patient's reported age differs from their actual age by a reasonable randomized amount to protect their privacy.

The data includes clinical characteristics (epidemiologic factors, comorbidities, vitals, clinician-assessed symptoms, patient-reported symptoms), in addition to radiological and laboratory findings. It does not include treatment plans, complications, and clinical outcomes, which is collected at inpatient facilities. Details about each field are available in the data dictionary.

The data includes both positive and negative test results for symptomatic and asymptomatic patients. The data does not include results for patients with severe symptoms. We refer such patients to ER.

It is important to note that our data collection is clinically-driven and therefore not systematic. This means that overall positive rates are descriptive of the Carbon Health patient population and cannot be generalized to the unobserved population. We provide functions to identify symptom severity to aid in accounting for the various admission criteria that affect positive rates.

Refresh Cadence and Organization

  • Each batch, stored as a CSV file, contains a week's worth of results from Carbon Health and Braid Health.
  • The first file, prefixed with 04-07, contains a month's worth of results starting from 03-07.
  • Each filename is prefixed with the most recent day (mm-dd) included in the batch, matching batch_date.
  • Each row contains the clinical characteristics of a patient who has taken a COVID-19 test.

Supplementary Material

  • Motivations, details, and next steps are available in this discussion from May 2020.
  • Data preparation and summarization functions are available in these two notebooks. For example, below is a visualization of the fill rate for all variables within the data repository.

Please note that the fill rates of vitals and clinician-assessed symptoms have dropped with the more recently published data batches due to the presence of a greater volume of mobile clinics, pop-up clinics, and home test kits. Clinicians are not present at these locations during the time of specimen collection, and therefore, vitals are not taken and exams are not provided.

Data Contributors and Supporters

Carbon Health — Clinical characteristics and laboratory findings

Carbon Health Logo Data Dictionary

Braid Health — Chest x-rays, findings, labels, and clinician impressions

Braid Health Logo Sample Chest X-ray

Supporters

Special thanks to Eren Bali, Kevin Quennesson, Nigam Shah, Andrew Therriault, Omer Koren, and Andrew Pikul for their support of this effort and for their feedback.

Call for Data

To ensure this data is representative of cases with varying severity levels and symptoms, we are requesting data from outpatient test centers and inpatient healthcare facilities which are treating COVID-19. Please use the data dictionary to prepare the data. Please send data and inquiries to covidclinicaldata@carbonhealth.com.

Research Contributors

Please share any studies on this data via email or a pull request.

Call for Research

Please use the format below to cite the data repository in your studies.

@dataset{2020covidclinicaldata,
  author =       {Carbon Health and Braid Health},
  title =        {Coronavirus Disease 2019 (COVID-19) Clinical Data Repository},
  howpublished = {Accessed from \url{https://covidclinicaldata.org/.}},
  year =         2020,
  version =      {10-20-2020}
}

Data Sharing Agreement

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons Licence