/raqdm

access data from EPA's Air Quality Datamart in R

Primary LanguageR

raqdm

Access data from EPA's Air Quality Data Mart in R

raqdm is an R package for directly accessing data from U.S. EPA's Air Quality Data Mart (AQDM). It uses the web interface described here to query the service and returns the results as a data.frame. Data can be queried synchronously or asynchronously, default values can be saved across R sessions, and a simple GUI is available to make it easier to make requests.

What's New - October 17, 2015

  • Updated to new AQS Data Mart endpoint, https://aqs.epa.gov/api/
  • Added support for synchronous data requests
  • Removed getAQDMavailable() because it appears to no longer be supported by the new endpoint

Installing

You can install from github with devtools:

  devtools::install_github("ebailey78/raqdm")

Getting Access

You will need a username and password from EPA to access the actual data. You can visit the Air Quality Data Mart for information on registering. It's free!

GUI

raqdm GUI

Use the GUI to make requests, set defaults, or create custom function calls to use later.

Setting Defaults

setAQDMdefaults is used to set default values for any of AQDM's query parameters:

  setAQDMdefaults(user = "myemail@example.com", pw = "niftymint56", 
                  param = "44201", frmonly = TRUE)

In this example we set defaults for username, password, param, and frmonly. Any request you make can skip these parameters and raqdm will insert them for you and any time you open the GUI these parameters will be entered by default.

raqdm will also save these default values and reload them the next time you load the package, preventing you from having to reenter the same information over and over.

Requesting data

Synchronous (rawData)

This option is currently disabled by EPA.

Asynchronous (rawDataNotify)

Use getAQDMdata() with synchronous = FALSE to make an asynchronous data request. This will return a AQDMrequest object. When the data is available, use getAQDMrequest() to retrieve the data from the server:

  request <- getAQDMdata(bdate = "20140101", edate = "20140531", 
                         state = "18", county = "089")
  
  ## Once the request is processed on the server:
  
  data <- getAQDMrequest(request, stringsAsFactors = FALSE)

Using raqdm

The Data Mart offers two services for retrieving data from AQS synchronous and asynchronous. The synchronous service return data directly to you after it is complete. The asynchronous method returns an id number that you can use to retrieve the data later. The synchronous service is offline for upgrades but the asynchronous service is available and how you can get data with raqdm for right now. When you make a data request to EPA with raqdm, EPA sends back the id number and raqdm saves it in a variable. You then use that variable to retrieve that requested data later.

The first thing you should do is set your username and password. raqdm can remember your username and password and will automatically insert it into any request you make. All the future examples assume you have set your username and password.

  setAQDMuser("ebailey@idem.in.gov", "my_password", save = TRUE)

Adding save = TRUE will cause raqdm to save a file on your computer with your username and password. The next time you use raqdm it will look for this file and load your username and password automatically.

Once you have set your username/password you can start requesting data. We will start with a non-GUI based example. You make data requests with the getAQDMdata() function:

x <- getAQDMdata(state = "55", pc = "CRITERIA", param = "42602", format = "DMCSV", 
                 bdate = "20140101", edate = "20141231", synchronous = FALSE)

In this example we are requesting all 2014 NO2 data for Wisconsin. Notice that we did not provide a username and password in this example because we set them with setAQDMuser(). raqdm will insert them into the request automatically. x <- is assigning the request id generated by EPA to the variable x. We will need this later when we retrieve the data. You can use any valid variable name to store the request id.

When AQDM completes the request it sends you an email with a link to the data. Once you get this email you can use the getAQDMrequest() function to retrieve the data from the server rather than downloading it from the link in the email. You need to provide the request id stored in x to the getAQDMrequest() function so that it knows which data to download.

d <- getAQDMrequest(x)

Assuming everything went okay, d now represents a data.frame that contains the data you requested.

Using the GUI

The GUI is a convientient way to make data requests or set defaults parameter values. It is divided into several sections to make it easier to navigate.

Authentication

raqdm Authentication
If you used setAQDMuser() this section should already be filled out. If not, enter the username and password provided by AQDM.

Date Ranges

raqdm Date Ranges
Use these boxes to narrow the timeframe of your request. Sample Dates are the actual sampling dates while Change Dates represent the last time the data was changed in AQS. The first box on the row is for the start date and the second box is for the end date. The format should be YYYYMMDD with no dashes, slashes, or spaces.

Parameters

raqdm Parameters
Use these boxes to select which measured parameters you are interested in. Parameter Class lets you select whole classes of parameters like HAPS, or CRITERIA. Parameter lets you select a single parameter. If you make a selection in Parameter Class, the choices in Parameter will update to reflect only parameters in that class.

Geography

There are several different ways to define a geographic area in your data request. The Geography section has been divided into 3 tabs to differentiate these ways.

State/County/Site

raqdm State/County/Site
Selecting a state will update the county box with counties in that state. Selecting a county will update the sites box with sites in that county. You must select a state to select a county and you must select a county to select a site.

Latitude/Longitude

raqdm Latitude/Longitude
You can define a geographic bounding box from this tab. There are restrictions on how large a bounding box can be. Refer to AQDM Query Limits for more information.

Other Geography

raqdm Other Geography
You can select a CBSA or CSA from this tab. They are mutually exclusive and are only grouped together because I didn't want to make seperate tabs for them.

Options

raqdm Other Geography
This is a collection of other options that can be set.

  • Request Type - Currently limited to only rawDataNotify.
  • Output Format - Allows you to select the format of the returned data.
  • Duration - Allows you limit results to those of a specific duration.
  • FRM/FEM Only If checked on FRM/FEM results will be returned.

Buttons

raqdm Buttons
These buttons provided access to several ways to use the GUI.

  • Cancel - Close the GUI without returning any result
  • Create Function - Create a function call that would request the selected data. The result will be printed to the console and returned as a string. This could be used to build function calls to include in your workflow.
  • Set Defaults - This will call setAQDMdefaults with the selected options causing them to because default selections. For example, if you were going to be pulling data for several different parameters for one county, you could set defaults for the state, county, and date range. Then you could make several getAQDMdata() requests while only passing the parameter each time.
  • Request Data - This will use your selection to make a data request to AQDM with getAQDMdata(). If using this button, be sure to assign the openAQDMgui() function to a variable so that the request id can be used to retrieve the data later. For example
  x <- openAQDMgui()

will return the request id to x after you click Request Data*.

Examples

library(raqdm)

# Set my username and password for the AQDM service
  setAQDMuser("ebailey@idem.in.gov", "my_password", save = TRUE)

# Set defaults for Wisconsin in 2014
  setAQDMdefaults(state = "55", bdate = "20140101", edate = "20141231")

# Request 2014 Benzene Data from Wisconsin
  x <- getAQDMdata(param="45201")
  
# Request 2014 NO2 Data from Wisconsin
  y <- getAQDMdata(param="42602")
  
# Request 2014 Ozone Data from Wisconsin
  z <- getAQDMdata(param="44201")
  
# Retrieve the benzene data
  benz <- getAQDMrequest(x)
  
# Retrieve the NO2 data
  no2 <- getAQDMrequest(y)
  
# Retrieve the Ozone data
  o3 <- getAQDMrequest(z)
  
# Example showing how to loops to do the same thing as previous example

library(raqdm)

# Set my username and password for the AQDM service
  setAQDMuser("ebailey@idem.in.gov", "my_password", save = TRUE)

# Set defaults for Wisconsin in 2014
  setAQDMdefaults(state = "55", bdate = "20140101", edate = "20141231")

# Create a vector with the parameters you are interested in
params <- c("45201", "42602", "44201")

# Use lapply to loop through the params vector, requesting each one from AQDM. A list of requests will be returned to the x variable
x <- lapply(params, function(p) {
  return(getAQDMdata(param=p))
})
    
# now loop through the requests to retrieve the data
y <- lapply(x, function(r) {
  return(getAQDMrequest(r))
})

# You could then use do.call and rbind to combine them into one data.frame
d <- do.call(rbind, y)

More Info

If you have any questions about this package, or you find a error, please contact me at the email address in my profile or open an issue here.