/virufy-covid

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

Decoding hidden patterns in COVID-19 coughs with AI

virufy logo

Virufy is a nonprofit research organization developing artificial intelligence (AI) technology to screen for COVID-19 from cough patterns, rapidly and at no cost through use of a smartphone. To learn more or get involved, visit our website. For more information, check out the resources available here.

This repository contains everything needed to get started on writing a COVID-19 detection model. The goal of the model is to take the cough audio files, as well as additional metadata about a patient if desired, to predict whether or not they are infected with COVID-19.

The virufy_cdf_quickstart.ipynb notebook contains the setup to download the dataset, filter the dataset to only labled entries, and prepare the data for training and testing. There are instructions included for saving your model as well.

Questions?
Join our Slack support workspace

Virufy common data format

Virufy has defined a standardized data format for COVID coughs.

Datasets

This data has been standardized using Virufy's Common Data Format. More data will be added as it becomes available.

  1. https://github.com/virufy/virufy-cdf-coughvid
  2. https://github.com/virufy/virufy-cdf-india-clinical-1
  3. https://github.com/virufy/virufy-cdf-coswara

Column structure

Column Description
row Row number of the data.
source Source of the cough data.
patient_id Unique identifier for the patient.
cough_detected The probability that the audio file contains an actual cough submission.
audio_path The file path to the audio file containing the patient's cough submission. All of the cough audio files are in the cough folder.
audio_type Either cough or speech
age The age of the patient.
biological_sex The sex at birth of the patient. This can be male, female, or NaN.
reported_gender The reported gender of the patient.
submission_date Date the cough was submitted by the patient to Coughvid.
pcr_test_date Date the PCR test for the presence of COVID-19 was taken.
pcr_result_date Date the test result from the PCR test for the presence of COVID-19 was received.
respiratory_condition Boolean indicator of whether or not the patient suffers from a respitory condition.
fever_or_muscle_pain Boolean indicator of whether or not the patient was suffering from fever or muscle pain.
pcr_test_result Result of the patient's PCR test for the presence of COVID-19. This can be positive, negative, untested, or pending.
pcr_test_result_inferred This is the best guess of a patient's COVID-19 diagnosis based on information specific to the dataset source. This can be positive, negative, untested, or pending.
covid_symptoms Boolean indicator of whether or not the patient was experiencing symptoms of COVID-19.

Note: The audio files containing the cough submissions are from a variety of file extensions including:

  • .webm
  • .ogg
  • .mp3
  • .m4a
  • .wav