This is a repo to scrape the data from Google's COVID19 community mobility reports https://www.google.com/covid19/mobility/. This code is released freely under the MIT Licence, and provided 'as-is'.
You'll need the packages: dplyr
, purrr
, xml2
, rvest
, pdftools
and countrycode
. These are all on CRAN.
2020-04-04 16:51 get_all_data.R
script pulls data from all reports, saved in the data folder
2020-04-04 16:26 Add comments to the functions, move tidyverse library call to scripts
2020-04-03 18:22 Converted code into a functions, added date and country codes into output tables, created functions for region reports (US state-level data)
2020-04-03 12:59 - First version, scrape of PDF and extract of data into CSV
The R/functions.R
script provides a number of functions to interact with the Google COVI19 Community Mobility Reports:
get_country_list()
gets a list of the country reports availableget_national_data()
extracts the overall figures from a country reportget_subnational_data()
extracts the locality figures from a country reportget_region_list()
gets a list of the region reports available (currently just US states)get_region_data()
extracts the overall figures from a region reportget_subregion_data()
extracts the locality figures from a region report
The functions return tibbles providing the headline mobility report figures, they do not extract or interact with the trend-lines provided in the chart reports. The tibbles have the following columns:
date
: the date from the PDF file namecountry
: the ISO 2-character country code from the PDF file nameregion
: for region reports the region nameentity
: the datapoint label, one ofvalue
: the datapoint value, these are presented as percentages in the report but are converted to decimal representation in the tables
There are six mobility entities presented in the reports:
entity value |
Description |
---|---|
retail_recr |
Retail & recreation: Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters |
grocery_pharm |
Grocery & pharmacy: Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies. |
parks |
Parks: Mobility trends for places like national parks, public beaches, marinas, dog parks, plazas, and public gardens. |
transit |
Transit stations: Mobility trends for places like public transport hubs such as subway, bus, and train stations. |
workplace |
Workplaces: Mobility trends for places of work. |
residential |
Residential: Mobility trends for places of residence. |
This code is also provided in mobility_report_scraping.R
library(tidyverse)
source("R/functions.R")
# get list of countries
# default url is https://www.google.com/covid19/mobility/
countries <- get_country_list()
# extract the url for the uk
uk_url <- countries %>% filter(country == "GB") %>% pull(url)
# extract overall data for the uk
uk_overall_data <- get_national_data(uk_url)
# extract locality data for the uk
uk_location_data <- get_subnational_data(uk_url)
# get list of us states
states <- get_region_list()
# extract the url for new york
ny_url <- states %>% filter(region == "New York") %>% pull(url)
# extract overall data for new york state
ny_data <- get_region_data(ny_url)
# extract locality data for new york state
ny_locality_data <- get_subregion_data(ny_url)