/tothewonder

Wrapper for CDC WONDER mortality data

Primary LanguageR

tothewonder

Lifecycle: experimental R-CMD-check



The goal of tothewonder is to make it easy to access CDC WONDER mortality statistics. This package was created to protest the REPUBLICAN and NRA led efforts to suppress research into the underlying causes of gun violence.

Installation

devtools::install_github("yukatapangolin/tothewonder", ref = "master")

or download the source zip file from github and unzip

library(devtools)
source <- devtools:::source_pkg("C:/path/tothewonder-master")
install(source)

Usage

Step 1: Start a Session

To use this package you’ll need to start a session at the CDC WONDER website and agree to their data use restrictions

Depending on which database you want to access you can use one of the following functions to start a session:

session_ucd99() - Underlying Cause of Death, 1999-2020

session_mcd_provisional() - Provisional Mortality Statistics, 2018 through Last Month

session_mcd_final18() - Multiple Cause of Death, 2018-2021, Single Race

After agreeing to abide by the data use restrictions the functions will return a string with the CDC WONDER session URL.

Note that it’s possible this package violates the terms of use of the CDC WONDER website since they have a public API that’s restricted to only return data at the national level and tothewonder allows you retrieve sub-national data, but the web application already lets you retrieve sub-national and it’s probably no big deal to use this package

However, in keeping with the vital statistics policy for public data sharing, only national data are available for query by the API. Queries for mortality and births statistics from the National Vital Statistics System cannot limit or group results by any location field, such as Region, Division, State or County, or Urbanization (urbanization categories map to specific geographic counties). For example, in the D76 online database for Detailed Mortality 1999-2013, the location fields are D76.V9, D76.V10 and D76.V27, and the urbanization fields are D76.V11 and D76.V19. These ’sub-national" data fields cannot be grouped by or limited via the API, although these fields are available in the web application. https://wonder.cdc.gov/wonder/help/WONDER-API.html

It’s the opinion of the tothewonder author that these restrictions make the official API useless. Additionally, since the web application already permits querying at the sub-national level the author sees no problem in automating this type of query as long as users follow the data use restrictions. The author of this package furthermore believes that these limitations were put in place thanks to RUSSIAN PUPPET PRESIDENT DONALD TRUMP to prevent researchers from investigating GUN VIOLENCE.

Sanctions for Violating Rules:

Researchers who violate the terms of the data use restrictions will lose access to WONDER and their sponsors and institutions will be notified. Researchers who are suspected of violating the rules may be prevented from using WONDER until an investigation can be completed. Deliberately making a false statement in any matter within the jurisdiction of any department or agency of the Federal government violates 18 USC 1001 and is punishable by a fine of up to $10,000 or up to 5 years in prison, or both.

Step 2 - Download your Data

Depending on which database you want to access you can use one of the following functions to download your data:

ucd99() - Underlying Cause of Death, 1999-2020

mcd_provisional() - Provisional Mortality Statistics, 2018 through Last Month

mcd_final18() - Multiple Cause of Death, 2018-2021, Single Race

Examples

library(tothewonder)

## Start a UCD (https://wonder.cdc.gov/ucd-icd10.html) session and
## agree to the data use restrictions
url_ucd99 <- session_ucd99()
## Download monthly non-Hispanic Black firearm deaths data for
## Missouri (FIPS code 29). This would include homicides, suicides, accidents
## and legal intervertions/operations of war. Singel Age Years 15-55 only
df <- ucd99(wonder_url = url_ucd99,
            group_by_1 = "Year",
            group_by_2 = "Month",
            group_by_3 = "None",
            group_by_4 = "None",
            show_totals = FALSE,
            show_confidence_interval = FALSE,
            show_standard_error = FALSE,
            age = 15:55,
            period = 1999:2020,
            residence_urbanization_year = "2013",
            residence_urbanization = "All Categories",
            residence_fips = "29", # Missouri has FIPS 29
            weekday = c("All Weekdays"),
            autopsy = c("All Values"),
            place_of_death = c("All Places"),
            gender = c("All"),
            hispanic_origin = "Not Hispanic or Latino",
            race = "Black or African American",
            ucd_option = "Injury Intent and Mechanism",
            ucd_injury_intent = "All Causes of Death",
            ucd_injury_mechanism = "Firearm"
)

## Start a session with the Provisional Mortality by Multiple Cause of Death db
url_provisional <- session_mcd_provisional()
## Download weekly data where the underlying cause of death was homicide, and
## where the victim was African American
## Only include the "15-24","25-34", and "35-44" ten year age groups
df <- mcd_provisional(wonder_url = url_provisional,
                      group_by_1 = "MMWR Week",
                      group_by_2 = "None",
                      group_by_3 = "None",
                      group_by_4 = "None",
                      show_totals = FALSE,
                      show_confidence_interval = FALSE,
                      show_standard_error = FALSE,
                      age = c("15-24","25-34", "35-44"),
                      period_option = "MMWR",
                      period = 2018:2022,
                      residence_fips = "All",
                      residence_urbanization_year = c("2013"),
                      residence_urbanization = "All Categories",
                      occurrence_fips = "All",
                      occurrence_urbanization_year = c("2013"),
                      occurrence_urbanization = "All Categories",
                      autopsy = c("All Values"),
                      place_of_death = c("All Places"),
                      gender = c("All Genders"),
                      hispanic_origin = "Not Hispanic or Latino",
                      race_option = "Single Race 6",
                      race = "Black or African American",
                      ucd_option = "Injury Intent and Mechanism",
                      ucd_injury_intent = "Homicide",
                      ucd_injury_mechanism = "All Causes of Death",
                      mcd_option = "MCD - ICD-10 Codes",
                      mcd_icd_codes = "All",
                      mcd_icd_codes_and = "All")

You can also save queries to a unique link that leads to the WONDER website by using the save parameter

save_session <- session_ucd99()
ucd99(wonder_url = save_session,
      save = TRUE,
      group_by_1 = "Year",
      group_by_2 = "Month",
      group_by_3 = "None",
      group_by_4 = "None",
      show_totals = FALSE,
      show_confidence_interval = FALSE,
      show_standard_error = FALSE,
      age = 15:55,
      period = 1999:2020,
      residence_urbanization_year = "2013",
      residence_urbanization = "All Categories",
      residence_fips = "29", # Missouri has FIPS 29
      weekday = c("All Weekdays"),
      autopsy = c("All Values"),
      place_of_death = c("All Places"),
      gender = c("All"),
      hispanic_origin = "Not Hispanic or Latino",
      race = "Black or African American",
      ucd_option = "Injury Intent and Mechanism",
      ucd_injury_intent = "All Causes of Death",
      ucd_injury_mechanism = "Firearm"
)

Which should output something like:

https://wonder.cdc.gov/controller/saved/D76/D296F514

(each query you save will generate a unique link even if all the parameters are the same)

Complex Example

Firearm homicide rate by county level presidential vote in 2020

library(tothewonder)
library(tidyverse)

## from https://github.com/tonmcg/US_County_Level_Election_Results_08-20
url <- paste0("https://raw.githubusercontent.com/yukatapangolin/",
               "US_County_Level_Election_Results_08-20/",
               "585cb48ccc17fef3640db33eef952cf613c67e95/",
               "2020_US_County_Level_Presidential_Results.csv")
trump_2020 <- read.csv(url,
                       colClasses = c("character", "character",
                                      "character", "integer",
                                      "integer", "integer",
                                      "integer", "numeric",
                                      "numeric", "numeric"))
trump_2020$winner <- ifelse(trump_2020$per_gop > trump_2020$per_dem,
                            "Trump", "Biden")
trump_2020 <- subset(trump_2020, per_gop >= .6 | per_dem >= .6)

trump_2020 <- subset(trump_2020, state_name != "Alaska")
## Shannon County (46113) became Oglala Lakota County (46102) in 2015 and
## WONDER doesn't know about the new FIPS code
## https://en.wikipedia.org/wiki/Oglala_Lakota_County,_South_Dakota
trump_2020$county_fips[which(trump_2020$county_fips == 46102)] <- 46113

url_mcd18 <- session_mcd_final18(I_Agree = TRUE)

deaths <- map_dfr( unique(trump_2020$winner), .f = \(x) {
    counties <- unique(subset(trump_2020, winner == x))$county_fips
    ## Parameters to functions use the same defaults
    ## as the ones available at CDC WONDER and you
    ## can omit them
    df <- mcd_final18(wonder_url = url_mcd18,
                      save = FALSE,
                      group_by_1 = "Year",
                      period = 2018:2021,
                      residence_fips = counties,
                      gender = "Male",
                      ucd_option = "Injury Intent and Mechanism",
                      ucd_injury_intent = "Homicide",
                      ucd_injury_mechanism = "Firearm")
    df$winner <- x
    return(df)
})

deaths |>
  mutate(deaths = as.numeric(Deaths)) |>
  ggplot(aes(Year.Code, Crude.Rate, group = winner, color = winner)) +
  geom_line(linewidth = .9) +
  labs(title = "Male Firearm Homicide Rate by 2020 Presidential County Vote",
       subtitle = paste0("Counties that voted for Trump in 2020",
                         " experienced smaller homicide increases."),
       caption = paste0("Source: CDC WONDER. ",
                        "2018-2021 Final Multiple Cause of Death Files")) +
  expand_limits(y = 0) +
  scale_color_manual(
    values = c("#244999",
               "#d22532"),
    labels = c("Biden won with 60%+ of the vote",
               "Trump won with 60%+ of the vote")
  ) +
  ylab("Rate per 100,000") +
  xlab("Year")

Male Firearm Homicide Rate by 2020 Presidential County Vote

Packaging and Releasing

  • Bump the version using the bump2version command (pip install –upgrade bump2version).

  • Update package data by running Rscript data-raw/DATASET.R

  • Run Rscript -e "devtools::document(roclets = c('rd', 'collate', 'namespace'))" to update documentation

  • Run find . -type f -exec sed to correct paths

  • Run Rscript -e 'rmarkdown::render("README.Rmd")' to knit README.Rmd

  • Update the NEWS.md file with any changes.

  • Tag the release git tag v0.1.0 && git push origin --tags

  • Checkout development branch