Meet the example problem
Closed this issue · 7 comments
It's time to meet the data analysis challenge for this course! Over the next series of issues, you'll connect with the USGS National Water Information System (NWIS) web service to learn about some of the longest-running monitoring stations in USGS streamgaging history.
The repository for this course is already set up with a basic targets data pipeline that:
- Queries NWIS to find the oldest discharge gage in each of three Upper Midwest states
- Maps the state-winner gages
⌨️ Activity: Switch to a new branch
Before you edit any code, create a local branch called "three-states" and push that branch up to the remote location "origin" (which is the github host of your repository).
git checkout main
git pull origin main
git checkout -b three-states
git push -u origin three-states
The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to main
and sync with "origin" whenever you're transitioning between branches and/or PRs.
Comment on this issue once you've created and pushed the "three-states" branch.
Ready!
⌨️ Activity: Explore the starter pipeline
Without modifying any code, start by inspecting and running the existing data pipeline.
- Open up _targets.R and read through - can you guess what will happen when you build the pipeline?
- Build all targets in the pipeline.
- Check out the contents of
oldest_active_sites
.
💡 Refresher hints:
- To build a pipeline, run
library(targets)
and thentar_make()
. - To assign an R-object pipeline target to your local environment, run
tar_load(mytarget)
. This function will load the object in its current state. - If you want to make sure you have the most up-to-date version of the target, you can have targets check for currentness or rebuild first by running
tar_make(mytarget)
and then usingtar_load(mytarget)
. - You'll pretty much always want to call
library(targets)
in your R session while developing pipeline code - otherwise, you need to calltargets::tar_make()
in place oftar_make()
anytime you run that command, and all that extra typing can add up.
When you're satisfied that you understand the current pipeline, include the value of oldest_active_sites$site_no
and the image from site_map.png in a comment on this issue.
Add a comment to this issue to proceed.
⌨️ Activity: Spot the split-apply-combine
Hey, did you notice that there's a split-apply-combine action happening in this repo already?
Check out the find_oldest_sites()
function:
find_oldest_sites <- function(states, parameter) {
purrr::map_df(states, find_oldest_site, parameter)
}
This function:
- splits
states
into each individual state - applies
find_oldest_site
to each state - combines the results back into a single
tibble
and it all happened in just one line! The split-apply-combine operations we'll be exploring in this course require more code and are more useful for slow or fault-prone activities, but they follow the same general pattern.
Check out the documentation for map_df
at ?purrr::map_df
or online here if this function is new to you.
When you're ready, comment again on this issue.
Ready!
⌨️ Activity: Apply a downloading function to each state
Awesome, time for your first code changes ✏️.
-
Write three targets in _targets.R to apply
get_site_data()
to each state instates
(insert these new targets under the# TODO: PULL SITE DATA HERE
placeholder in_targets.R
). The targets should be namedwi_data
,mn_data
, andmi_data
.oldest_active_sites
should be used for thesites_info
argument inget_site_data()
. -
Add a call to
source()
near the top of _targets.R as needed to make your pipeline executable. -
Test it: You should be able to run
tar_make()
with no arguments to get everything built.
💡 Hint: the get_site_data()
function already exists and shouldn't need modification. You can find it by browsing the repo or by hitting Ctrl-SHIFT-F. in RStudio and then searching for "get_site_data".
When you're satisfied with your code, open a PR to merge the "three-states" branch into "main". Make sure to add _targets/*
, 3_visualize/out/*
, and any .DS_Store files to your .gitignore
file before committing anything. In the description box for your PR, include a screenshot or transcript of your console session where the targets get built.