/COVID19

Coronavirus COVID-19 (2019-nCoV) Epidemic Datasets

Primary LanguageRGNU General Public License v3.0GPL-3.0

Coronavirus COVID-19 (2019-nCoV) Epidemic Datasets

The repository aims at unifying COVID-19 datasets across different sources in order to simplify the data acquisition process and the subsequent analysis. You are welcome to join and contribute by extending the number of supporting data sources as a joint effort against COVID-19.

The data are available to the end-user via the R package COVID19 or in csv format (see below or on Kaggle).

About

Goal

Provide the research community with a unified data hub by collecting worldwide fine-grained data merged with demographics, air pollution, and other exogenous variables helpful for a better understanding of COVID-19.

How

The data are collected with the R package COVID19. For R users, the COVID19 package is the recommended way to interact with the dataset. For non R users, the data are provided in csv format and regularly updated (see below or on Kaggle).

Join the mission

Whether or not you are an R user... take part in the data collection! Your contribution will be gratefully acknowledged. See how to contribute.

R Package COVID19

Simple, yet effective R package to acquire tidy format datasets of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The data are downloaded in real-time, cleaned and matched with exogenous variables.

Quickstart

# Install COVID19
install.packages("COVID19")

# Load COVID19
require("COVID19")

Usage

covid19(ISO = NULL, level = 1, start = "2019-01-01", end = Sys.Date(), vintage = FALSE, raw = FALSE, cache = TRUE)

Arguments

Argument Description
ISO vector of ISO codes to retrieve (alpha-2, alpha-3 or numeric). Each country is identified by one of its ISO codes
level integer. Granularity level. 1: country-level data. 2: state-level data. 3: city-level data.
start the start date of the period of interest.
end the end date of the period of interest.
vintage logical. Retrieve the snapshot of the dataset at the end date instead of using the latest version? Default FALSE.
raw logical. Skip data cleaning? Default FALSE.
cache logical. Memory caching? Significantly improves performance on successive calls. Default TRUE.

Details

The raw data are cleaned by filling missing dates with NA values. This ensures that all countries share the same grid of dates and no single day is skipped. Then, NA values are replaced with the previous non-NA value or 0 for the following variables:

  • deaths, confirmed, tests, recovered, icu, hosp, vent
  • driving, walking, transit

If no data is available at a granularity level (country/state) but is available at a lower level (state/city), the higher level data are obtained by aggregating the lower level data.

Examples

# Worldwide data by country
covid19()

# Worldwide data by state
covid19(level = 2)

# US data by state
covid19("USA", level = 2)

# Swiss data by state (cantons)
covid19("CHE", level = 2)

# Italian data by state (regions)
covid19("ITA", level = 2)

# Italian and US data by city
covid19(c("ITA","USA"), level = 3)

Dataset

Variable Description
id location identifier.
date observation time.
deaths cumulative number of deaths.
confirmed cumulative number of confirmed cases.
tests cumulative number of tests.
recovered cumulative number of patients released from hospitals or reported recovered.
hosp number of hospitalized patients on date.
icu number of hospitalized patients in ICUs on date.
vent number of patients requiring invasive ventilation on date.
driving relative volume of (driving) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility
walking relative volume of (walking) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility
transit relative volume of (transit) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility
country administrative area of top level.
state administrative area of a lower level, usually states, regions or cantons.
city administrative are of a lower level, usually cities or municipalities.
lat latitude.
lng longitude.
pop total population.
pop_14 population ages 0-14 (% of total population)*.
pop_15_64 population ages 15-64 (% of total population).**
pop_65 population ages 65+ (% of total population).
pop_age median age of population.
pop_density population density per km2.
pop_death_rate population mortality rate.

* Switzerland: ages 0-19

** Switzerland: ages 20-64

CSV Data Files

CSV datasets of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The files are generated with the R package COVID19 and updated daily.

Clean data

Raw data

Data coverage

Help improve the data coverage and add new countries and variables. See how to contribute.

deaths confirmed tests recovered hosp icu vent driving walking transit lat lng pop pop_14 pop_15_64 pop_65 pop_age pop_density pop_death_rate
(coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage) (coverage)
World
by country
US
by state
by city
Italy
by region
by city
Switzerland
by canton

Data sources

The following sources are gratefully acknowledged for making the data available to the public.

deaths confirmed tests recovered hosp icu vent driving walking transit lat lng pop pop_14 pop_15_64 pop_65 pop_age pop_density pop_death_rate
World
by country Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Apple mobility trends Apple mobility trends Apple mobility trends Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU World Bank Open Data (2018) World Bank Open Data (2018) World Bank Open Data (2018) World Bank Open Data (2018) World Factbook by CIA (2018) World Bank Open Data (2018) World Bank Open Data (2018)
US
by state Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU
by city Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU Center For Systems Science and Engineering at JHU
Italy
by region Ministero della Salute Ministero della Salute Ministero della Salute Ministero della Salute Ministero della Salute Ministero della Salute Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018)
by city Ministero della Salute Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018) Istituto Nazionale di Statistica (2018)
Switzerland
by canton Open Government Data Open Government Data Open Government Data Open Government Data Open Government Data Open Government Data Open Government Data Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018) Swiss Federal Statistical Office (2018)

Acknowledgements

The following people have contributed to the data collection as a joint effort against COVID-19.

deaths confirmed tests recovered hosp icu vent driving walking transit lat lng pop pop_14 pop_15_64 pop_65 pop_age pop_density pop_death_rate
World
by country E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti
US
by state E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti
by city E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti
Italy
by region E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti
by city E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti
Switzerland
by canton E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti E.Guidotti

Use Cases

Citation

Emanuele Guidotti, “Coronavirus COVID-19 (2019-nCoV) Epidemic Datasets.” Kaggle, doi: 10.34740/KAGGLE/DS/574488.