/askchisne

R package that connects to the AskCHIS NE API.

Primary LanguageR

Build Status

askchisne

An R package that connects to the AskCHIS NE API and retrieves data into R data frames. NOTE: this package does not come with any warranties.

Installation

Install from CRAN: COMING SOON!

Install from GitHub:

devtools::install_github("bogdanrau/askchisne")

Usage

Before using this package, you must obtain an API key from the California Health Interview Survey (CHIS). You will use the API key in all functions to connect to the API and get data.

Base Functions

The following functions can be used to query the AskCHIS NE API:

Function Description
getMetadata() Obtains all of the metadata from AskCHIS NE.
geoSearch() Returns available locations for a search term.
getEstimate() Returns estimate and attributes for one or more locations in the database.
poolEstimate() Pools estimates for two or more locations.

getMetadata() function obtains all of the metadata available in AskCHIS NE. This function has only one simple call:

getMetadata(apiKey = '<YOUR API KEY>')

The resulting data frame will contain:

  • name: the indicator name (required in the getEstimate() function).

  • label: the indicator label.

  • ageGroup: the age group for that specific indicator.

  • year: the data year for the indicator.

  • responseLabel: the response label (can be used if developing Shiny apps).

  • description: a description of the indicator and link to additional resources if non-CHIS.


geoSearch() function searches the API for all available geographic locations matching the search string. The function requires a search string and the API key:

geoSearch(search = 'YOUR SEARCH TERM', apiKey = '<YOUR API KEY>')

The resulting data frame will contain:

  • geoId: the geoId for each resulting location. This geoId will be required when searching for estimates.

  • name: the location name.

  • geoType: the type of location. This can be a zip code (ZCTA), a city (CITIES), a county (COUNTIES, a legislative district (ASSEMBLY, CONGRESS, SENATE), or the state (STATE).


getEstimate() function retrieves estimates as well as additional statistical attributes for one or more requested locations:

getEstimate(indicator = 'INDICATOR NAME', attributes = NULL, geoLevel = NULL, locations = NULL, year = NULL, apiKey = '<YOUR API KEY>')
Parameter Description
indicator The name of the indicator, which can be obtained using the getMetadata function.
attributes If not specified, returns all available attributes: estimate, population, SE, CI_LB95, CI_UB95, CV, MSE.
geoLevel The level of geography for the query: zcta, cities, counties, assembly, congress, senate, state.
locations A comma separated list of geoIds for each location queried (can use the geoSearch() function to obtain those).
year The year for the data requested. Years available are accessible through the getMetadata() function. If left null, will return the newest data available.

The resulting data frame will contain:

  • geoId: the geoIds for all locations returned.

  • geoName: the name of all of the locations returned.

  • geoTypeId: the type of location returned (zip code, city, etc.).

  • isSuppressed: a TRUE/FALSE describing whether the estimate is suppressed.

  • suppressionReason: description of reason for suppression.

  • population: the population universe for that specific indicator.

  • estimate: the estimate for that indicator.

  • SE: the Standard Error (SE) for that indicator.

  • CI_LB95: the lower bound of the 95% Confidence Interval.

  • CI_UB95: the upper bound of the 95% Confidence Interval.

  • CV: the Coefficient of Variation.

  • MSE: the Mean Square Error.

  • year: the specific year for which the data was either collected or created.

  • unit: the unit of measurement for the estimate.


poolEstimate() function combines multiple locations and returns the pooled estimate for those locations.

poolEstimate(indicator = 'INDICATOR NAME', attributes = NULL, locations = 'LIST OF LOCATION geoIds', year = NULL, apyKey = '<YOUR API KEY>')

The resulting data frame will contain the same columns as the response from getEstimate(). with the difference that poolEstimate() returns pooled locations, not individual locations.