This repository houses open datasets created by Jataware for research related to the COVID-19 response efforts.
Disclaimer: the datasets contained in this repository have been machine curated and are not vetted by human experts. They should be taken as representative of events that occurred on the ground, but should not be considered authoritative sources or ground truth.
Nonpharmaceutical Interventions, or NPIs, are policy actions taken by communities to mitigate the spread of diseases such as COVID-19. These types of interventions include implementing stay at home orders, school closures, and social distancing recommendation, etc.
We have an ongoing project to scan the internet for news articles the feature country, state, county, and city level reporting on NPIs. We then categorize the NPI that the article is most likely describing in order to generate an estimate for when various NPIs were implemented and where.
To facilitate usage in spreadsheet tools like Excel, the underlying article text is not stored in the
.csv
files, but is available in the.jsonl
(JSON lines) files.
state_of_emergency
: the geography has implemented a state of emergencyshelter_in_place
: the geography has implemented a shelter in place or stay home orderlockdown
: the geography has implemented a curfew or lockdownquarantine
: the geography has insituted some type of quarantine measuresocial_distance
: the geography has required social distancing measuresdisaster_declaration
: the geography has issued a disaster declarationschool_business_closure
: the geography has ordered schools or businesses to closetravel
: the geography has implemented travel restrictions
U.S. County level NPI data is available in County-NPIs.csv
.
U.S. city level NPI data is available in City-NPIs.csv
.
We have a special collection focused on the CDC's City Readiness Initiative Cities. This dataset, CDC-CRI-City-NPIs.jsonl
, focuses on these cities. It includes additional fields that capture sentences related to easing of NPIs. Fields prepended with RGX_
indicate the usage of regular expression based filters for quantities and terms. Quantities (such as numbers of tests) are returned as array of quantity extractions. Terms, such as "social distancing" are counted and returned as frequency counts.
Extraction fields that are not prepended with RGX_
are Odinson based extractions. Odinson is used to perform a set of rule based extractions that seeks to identify relevant syntactic patterns. When the pattern is identified, the relevant sentences are returned for additional human verification and validation.
Country level NPI data is available in World-NPIs.csv
.
We have also collected data on healthcare system capacity. To accomplish this, we've relied on information extraction from open source news articles. We have attempted to automatically extract the following:
tests
: number of tests and/or test-kitsventilators
: number of ventilators or respiratorsbeds
: number of hospital beds and/or ICU bedsppe
: number of n95 masks, surgical masks, PPE
We can loosely associate these metrics with a geography based on the news article. Note that we do not perform entity resolution on the extracted metrics, so they can take a variety of types:
Category | Type | Example |
---|---|---|
tests | tests | 300 tests |
tests | test-kits | 300 test-kits |
tests | test kits | 300 test kits |
tests | COVID-19 tests | 300 COVID-19 tests |
ventilators | ventilators | 100 ventilators |
ventilators | respirators | 100 respirators |
beds | hospital beds | 50 hospital beds |
beds | ICU beds | 50 ICU beds |
ppe | n95 masks | 1,000 n95 masks |
ppe | medical masks | 1,000 medical masks |
ppe | surgical masks | 1,000 surgical masks |
ppe | ppe | 1,000 ppe |
ppe | personal protective equipment | 1,000 personal protective equipment |
State-level healthcare capacity data can be found in State-Capacity-Measures.csv
.
Country-level healthcare capacity data can be found in Global-Capacity-Measures.csv
.