Drainage areas of each county and HU
thjchen opened this issue · 5 comments
Hello,
I'm looking to acquire drainage area data for various distances of each county. I've made some progress by aggregating the upstream catchments of all flowlines across each county, but it currently takes about 3 minutes to obtain the drainage area for one county at a specific distance. With numerous counties and distances to cover, this process would take weeks. I'm wondering if there are more efficient methods to obtain the drainage areas.
Here's the code I'm currently using:
## Setup geometry
county_shapes_cb <- counties() # get county boundaries by `tigtis`
filepath <- "data/geography/NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_Lower48.gdb"
st_layers(filepath)
catchments_all <- st_read(filepath, layer = "Catchment") # get hydrological data from downloaded data
## Obtain the upstream basin for one sample county at a certain distance
county_code = 2484 #COUNTYNS=01065575, Belmont
distance = 500
# Find flowlines across the county boundary
flowlines_county <- get_nhdplus(AOI = county_shapes_cb[county_code,], realization='flowline')
crosses <- st_crosses(flowlines_county, county_shapes_cb[county_code,])
crosses <- vapply(crosses, function(x) length(x) > 0, TRUE)
flowlines_crossed <- flowlines_county[crosses, ]
# Get the upstream region by aggregating the upstream catchments of all flowlines across the county boundary
catchment <- data.frame()
for (flowline_code in 1:nrow(flowlines_crossed)) {
# Get the upstream catchments of a flowline
start_comid <- flowlines_crossed[flowline_code,2]$comid
start_point <- flowlines_crossed[flowline_code,2]$geometry
flowline <- navigate_nldi(list(featureSource = "comid",
featureID = start_comid),
mode = "upstreamTributaries", #think about this:"upstreamMain"
distance_km = distance)
flowline$UT$nhdplus_comid
subset <- catchments_all %>%
filter(FEATUREID %in% as.integer(flowline$UT$nhdplus_comid))
catchment <- rbind(catchment, subset)
}
# Join all the catchments
upstream_region <- st_make_valid(catchment) %>%
distinct() %>%
summarise()
Additionally, I'm curious if there's a way to determine the hydrologic units that flow into each hydrologic unit. For example, is there a way to identify the HUC6s that flow into a specific HUC6?
Hi There -- you could scale this analysis up with a couple modifications to get better performance.
Instead of using get_nhdplus()
, which calls a web service, you could do a spatial intersection between the national layer of flowlines and the county polygons. The geos package has some very high performance join methods. https://paleolimbot.github.io/geos/
Instead of using the NLDI, which calls a web service, you could use sort_network()
from hydroloom
https://doi-usgs.github.io/hydroloom/reference/sort_network.html which will do an upstream with tributaries navigation (return a subset of the whole network upstream of each outlet) for each outlet you provide.
With those two modifications, you should be able to run your entire analysis in one go without the need to iterate. It will take a bit more memory, but it will be orders of magnitude faster.
Thank you!! I'll try.
A quick other question- if there's a way to determine the hydrologic units that flow into each hydrologic unit. For example, is there a way to identify the HUC6s that flow into a specific HUC6?
Oh -- sorry for missing that bit. There is, but it's not 100%.
The 102020wbd_outlets.gpkg here:
Blodgett, D.L., 2023, Mainstem Rivers of the Conterminous United States (ver. 2.0, February 2023): U.S. Geological Survey data release, https://doi.org/10.5066/P92U7ZUT.
Contains attributes that point to the next down HU at a variety of levels. Those attributes are derived from outlet HUC12s that flow to a different HUC12 then aggregated up. It's not 100% because there are some HUC12s that don't plumb with the river network -- some have multiple outlets and some don't flow. That data release will probably work for all the HUC6s but there will be cases at the 10 and 12 where things are missing.
Yeah -- based on your map, that would be the case. That table is the result of identifying outlet locations along the network that are unambiguous and in agreement with where the HUC topology table says things go. So missing data is expected where things are ambiguous.