/fetchers-python

Data source fetchers written in python

Primary LanguagePythonApache License 2.0Apache-2.0

Oxford COVID-19 (OxCOVID19) Data Fetcher Repository

This is the data fetcher-python repository for the OxCOVID19 Database, a large, single-centre, multimodal database consisting of information relating to COVID-19 pandemic.

OxCOVID19 Project https://covid19.eng.ox.ac.uk/ aims to increase our understanding of the Covid-19 pandemic and elaborate possible strategies to reduce the impact on the society through the combined power of Statistical and Mathematical Modelling, and Machine Learning techniques. OxCOVID19 data source fetchers written in Python3.


Cite as: Adam Mahdi, Piotr Błaszczyk, Paweł Dłotko, Dario Salvi, Tak-Shing Chan, John Harvey, Davide Gurnari, Yue Wu, Ahmad Farhat, Niklas Hellmer, Alexander Zarebski, Lionel Tarassenko, Oxford COVID-19 Database: multimodal data repository for understanding the global impact of COVID-19.University of Oxford, 2020.


Currently implemented fetchers:

Name Country Country Code Data source Status Regional levels mapping Terms of Use
GOOGLE_MOBILITY World several COVID-19 Community Mobility Reports - Google release adm_area_1, adm_area_2: depending on the country
APPLE_MOBILITY World several COVID‑19 Mobility Trends Reports - Apple release adm_area_1, adm_area_2: depending on the country
GOVTRACK World several Oxford COVID-19 Government Response Tracker release NA
WEATHER World several MET Informatics Lab release adm_area_1, adm_area_2, adm_area_3: depending on the country
WRD_ECDC World several European Centre for Disease Prevention and Control release NA
POL_WIKI Poland POL Wikipedia release adm_area_1: NA or voivodeship
ESP_MSVP Spain ESP Ministerio de Sanidad release adm_area_1: comunidades autónomas
ZAF_DSFSI South Africa ZAF Data Science for Social Impact research group, the University of Pretoria release adm_area_1: province
BRA_MSHM Brazil BRA github: elhenrico release adm_area_1: province
SWE_GM Sweden SWE github: elinlutz release adm_area_1: province
KOR_DS4C South Korea KOR Data Science for COVID-19 in South Korea release adm_area_1: NA or province
AUS_C1A Australia AUS The Real-time COVID-19 Status in Australia release adm_area_1: NA or state
POR_MSDS Portugal POR Data Science for Social Good Portugal release adm_area_1: NA or province
GBR_PHTW United Kingdom GBR Coronavirus (COVID-19) UK Historical Data release adm_area_1: NA or country, adm_area_2: NA or upper tier/health boards
CHE_OPGOV Switzerland CHE Kanton Zürich Statistisches Amt release adm_area_1: canton
TUR_MHOE Turkey TUR github:ozanerturk release adm_area_1: NA
BEL_LE Belgium BEL github:eschnou release adm_area_1: NA
IND_COVIND India IND COVID19-India API release adm_area_1: NA or state
CAN_GOV Canada CAN Government of Canada release adm_area_1: province
IDN_GTPPC Indonesia IDN Government of Indonesia - Coronavirus Disease Response Acceleration Task Force release adm_area_1: province
NLD_CW Netherlands NLD CoronaWatchNL release adm_area_1: NA/province
LAT_DSRP Latin America several Latin America Covid-19 Data Repository by DSRP release adm_area_1: subdivision
EU_ZH Belgium BEL Novel Coronavirus Outbreak in Europe - Chinese language candidate adm_area_1: region; adm_area_2: province
EU_ZH Austria AUT Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: state
EU_ZH Czech Republic CZE Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: region
EU_ZH Germany DEU Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: state
EU_ZH Hungary HUN Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: NA
EU_ZH Norway NOR Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: county
EU_ZH Poland POL Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: voivodeship
EU_ZH Slovenia SVN Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: NA
EU_ZH Sweden SWE Novel Coronavirus Outbreak in Europe - Chinese language release adm_area_1: province
NGA_SO Nigeria NGA Covid-19 Nigeria API release adm_area_1: state
NGS_CDC Nigeria NGA Nigeria Centre for Disease Control release adm_area_1: state
RUS_GOV Russia RUS Russian Government release adm_area_1: federal subjects
ITA_PC Italy ITA Protezione Civile release adm_area_1: italian regions, adm_area_2: italian provinces
ITA_PCDM Italy ITA Davide Magno, from Protezione Civile release adm_area_1: italian region
USA_NYT United States USA New York Times release adm_area_1: US State, adm_area_2: county (exception is New York City, which includes more counties)
FRA_SPFCG France FRA Cedric Guadalupe from Santé Publique France release adm_area_1: France "régions"
DEU_JPGG Germany DEU Jan-Philip Gehrcke, from the Public Health Offices (Gesundheitsaemter) release adm_area_1: German "länder"
PAK_GOV Pakistan PAK Government of Pakistan release adm_area_1: Province
GBR_PHE United Kingdom GBR Public Health England release adm_area_3: English lower tier local authority
GBR_PHW United Kingdom GBR Public Health Wales candidate adm_area_2: Welsh health board for deaths, local authority for tests
SWE_SIR Sweden SWE Svenska Intensivvårdsregistret (SIR) release adm_area_1: Swedish counties (Län)
MYS_MHYS Malysia MYS ynshung release adm_area_1: NA or province
JPN_C1JACD Japan JPN COVID-19 Japan Anti-Coronavirus Dashboard release adm_area_1: prefecture
USA_CTP United States USA The COVID Tracking Project release adm_area_1: state
FRA_SPF France FRA Données hospitalières relatives à l'épidémie de COVID-19 candidate adm_area_1: France "régions", adm_area_2: France "départements"
CHN_ICL Mainland China CHN MRC Centre Imperial College London candidate adm_area_1: province or None
ESP_MS Spain ESP Ministerio de Sanidad candidate adm_area_1: comunidades autónomas
IRL_HSPC Ireland IRL Health Surveillance Protection Centre candidate adm_area_1: county

Explanation of status:

  • Draft: being developed, should not be tested yet
  • Candidate: development complete, being tested on a private test database
  • Release: tested, data are fed into the official public database

Database structure

See https://covid19db.github.io/data.html

Develop and test

You need:

  • Python3
  • (optional) Running instance of a PostgreSQL database
  • (optional) Docker

Run locally

  1. Add the DB_ADDRESS, DB_PORT, DB_NAME, DB_USERNAME and DB_PASSWORD environment variables
  2. Install requirements pip install -r requirements.txt
  3. Run fetcher python3 ./main.py

Run locally using Docker

  1. Add the STAGE=test, DB_ADDRESS, DB_PORT, DB_NAME, DB_USERNAME and DB_PASSWORD environment variables
  2. Run docker-compose up

Environmental variables

Variable name Default value Description
DB_USERNAME Postgres database adapter user name
DB_PASSWORD Postgres database adapter password
DB_ADDRESS Postgres database adapter address
DB_NAME Postgres database adapter name
DB_PORT 5432 Postgres database adapter port
SQLITE SQLITE adapter file path
CSV CSV adapter file path
VALIDATE_INPUT_DATA False Validate input data
SLIDING_WINDOW_DAYS Sliding window, number of days in the past to process
RUN_ONLY_PLUGINS ALL Run selected plugins from given list, run all plugins if empty
LOGLEVEL DEBUG Log level
SYS_EMAIL Notifications SMTP username
SYS_EMAIL_PASS Notifications SMTP password

Contribute

We need fetchers!

Create a fetcher for a country that is not listed yet and send us a pull request. Use only official sources, or sources derived from official sources.

You can find example code for fetcher in /src/plugins/_EXAMPLE/example_fetcher.py