COVID-19 Scenarios Data
Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
Got questions or suggestions?
Discover
Simulator | Source code repository | Data repository | Updates | ||
---|---|---|---|---|---|
Contents
Country codes
List of countries associated to regions, subregions, and three letter codes supplied by the U.N.
Population data
List of settings used by the default scenario by COVID-19 epidemic simulation for different regions of interest.
Case count data
Within the directory ./case-counts
is a structured set of tsv files containing aggregated data for select country and subregion/city.
We welcome contributions to keep this data up to date.
The format chosen is:
time cases deaths hospitalized ICU recovered
2020-03-14 ...
We are actively looking for people to supply data to be used for our modeling!
Contributing and curating data:
Adding case count data for a new region:
The steps to follow are:
Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.
- Write a script that downloads and converts raw data into TSV format
- Columns: [time, cases, deaths, hospitalized, ICU, recovered]
- Important: all columns must be cumulative data.
- The time column must be a string formatted as
YYYY-MM-DD
- Try to keep the same order of columns for hygiene, although it should not ultimately matter
- If data is missing, please leave the entry empty
- Use the store_data() function in utils to store the data into .tsv and .json files automatically
- Place the script into the parsers directory
- The name should correspond to the region name desired in the scenario.
- There must be a function parse() defined that calls store_data() from utils
- Ensure that the path provided to store_data() is well formatted
- The structure of the directory is Region/Sub-Region/Country/
- Region and Sub-Region are designated as per the U.N.
- U.N. designations are found within country_codes.csv
- Please use only the U.N. designated name for the country, region, and sub-region.
Update the sources.json file to contain all relevant metadata.
- The three fields are:
- primarySource = The URL/path to the raw data
- dataProvenance = The organization behind the data collection
- license = The license governing the usage of data
Add populations data for the additional regions/states.
Case count data is most useful when tied to data on the population it refers to. To ensure new case counts are correctly included in the population presets, add a line to the populationData.tsv
for each new region (see Adding/editing population data for a country and/or region below).
Updating/editing case count data for the existing region:
We note that this option is not preferred relative to a script that automatically updates as outlined above. However, if there is no accessible data sources, one can manually enter the data. To do so
Commit a manually entered file into the correct directory
- The structure of the directory is Region/Sub-Region/Country/
- Region and Sub-Region are designated as per the U.N.
- U.N. designations are found within country_codes.csv
- Please use only the U.N. designated name for the country, region, and sub-region.
Adding/editing population data for a country and/or region:
As of now all data used to initialize scenarios used by our model is found within populationData.tsv It has the following form:
name populationServed ageDistribution hospitalBeds ICUBeds suspectedCaseMarch1st importsPerDay
Switzerland ...
- Names: the U.N. designated name found within country_codes.csv
- For a sub-region/city, please prefix the name with the three letter country code of the containing country. See country_codes.csv for the correct letters.
- populationServed: a number with the population size
- ageDistribution: name of the country the region is within. Must be U.N. designated name
- hospitalBeds: number of hospital beds within the region
- ICUBeds: number of ICU beds
- suspectedCasesMarch1st: The number of cases thought to be within the region on March 1st.
- importsPerDay: number of suspected import cases per day
At least one of suspectedCasesMarch1st
and importsPerDay
needs to be non-zero. Otherwise there is no outbreak (good news in principle, but not useful for exploring scenarios).