/RNHANES

R package for accessing and analyzing CDC NHANES data

Primary LanguageRApache License 2.0Apache-2.0

RNHANES

RNHANES is an R package for accessing and analyzing CDC NHANES (National Health and Nutrition Examination Survey) data that was developed by Silent Spring Institute.

CRAN Version Build Status codecov.io downloads per month grand total downloads

Demo of RNHANES

Features

  • Download and search NHANES variable and data file lists
  • Download and cache NHANES data files
  • Compute survey-weighted detection frequencies, quantiles, and geometric means
  • Plot weighted histograms

Install

You can install the latest stable version from github:

library(devtools)

install_github("silentspringinstitute/RNHANES")

The version through CRAN is older and will pull errors when working with more recent NHANES cycles:

install.packages("RNHANES")

Documentation

You can browse the package's documentation on the RNHANES website: http://silentspringinstitute.github.io/RNHANES/.

Examples

library(RNHANES)

# Download environmental phenols & parabens data from the 2011-2012 survey cycle
dat <- nhanes_load_data("EPH", "2011-2012")

# Download the same data, but this time include demographics data (which includes sample weights)
dat <- nhanes_load_data("EPH", "2011-2012", demographics = TRUE)

# Find the sample size for urinary triclosan
nhanes_sample_size(dat,
  column = "URXTRS",
  comment_column = "URDTRSLC",
  weights_column = "WTSA2YR")

# Compute the detection frequency of urinary triclosan
nhanes_detection_frequency(dat,
  column = "URXTRS",
  comment_column = "URDTRSLC",
  weights_column = "WTSA2YR")

# Compute 95th and 99th quantiles for urinary triclosan
nhanes_quantile(dat,
  column = "URXTRS",
  comment_column = "URDTRSLC",
  weights_column = "WTSA2YR",
  quantiles = c(0.95, 0.99))
  
# Compute geometric mean of urinary triclosan
nhanes_geometric_mean(dat,
  column = "URXTRS",
  weights_column = "WTSA2YR")

# Plot a histogram of the urinary triclosan distribution
nhanes_hist(dat,
  column = "URXTRS",
  comment_column = "URDTRSLC",
  weights_column = "WTSA2YR")

# Build a survey design object for use with survey package
design <- nhanes_survey_design(dat, weights_column = "WTSA2YR")

Geometric mean

An easy way to calculate geometric means is now built into RNHANES via the nhanes_geometric_mean function, but the version in CRAN hasn't been updated yet. If you are using the CRAN version, however, you can compute them by taking the arithmetic mean of a log-transformed variable and exponentiating. Here's an example:

library(survey)
library(RNHANES)
library(tidyverse)

dat <- nhanes_load_data("EPHPP_H", "2013-2014", demographics = TRUE) %>%
  filter(!is.na(URXBPH))

des <- nhanes_survey_design(dat, "WTSB2YR")

logmean <- svymean(~log(URXBPH), des, na.rm = TRUE)

# Geometric mean lower 95% confidence interval
exp(logmean[1] - 1.96 * sqrt(attr(logmean, "var")))

# Geometric mean
exp(logmean)[1]

# Geometric mean upper 95% confidence interval
exp(logmean[1] + 1.96 * sqrt(attr(logmean, "var")))

Correlations

I recommend using the svycor function from the jtools package to compute survey-weighted Pearson correlations between NHANES variables:

library(RNHANES)
library(tidyverse)
library(jtools)

# Download PAH dataset
nhanes_dat <- nhanes_load_data("PAH_H", "2013-2014", demographics = TRUE)

# Build the survey design object
des <- nhanes_survey_design(nhanes_dat)

svycor(~log(URXP01) + log(URXP04) + log(URXP06) + log(URXP10), design = des, na.rm = TRUE)

Acknowledgements

Thanks to the following people for contributing pull requests: