DOI-USGS/national-flow-observations

Empty UV data causes error during combine

lindsayplatt opened this issue · 6 comments

I'm not sure how this is happening but one of the UV files doesn't have any data. This is causing an error within the combine_nwis_data() function at the end of the UV task makefile in 10_nwis_uv_pull_tasks.yml because it tries to pivot the data using convert_to_long() and fails because the columns are missing.

> dat <- qs::qread("10_nwis_pull/tmp/uv_230505_137.qs")
> dat
# A tibble: 0 x 0
> convert_to_long(dat)
 Error: Can't subset columns that don't exist.
x Column `agency_cd` doesn't exist.
Run `rlang::last_error()` to see where the error occurred. 

I'm not sure why this single file is empty. The value of the partition information stored as uv_partition_230505_137 before passing to get_nwis_data() is

# A tibble: 1 x 4
  site_no  count_nu PullTask   PullDate  
  <chr>       <dbl> <chr>      <chr>     
1 03501975   350496 230505_137 2023-05-05

So we would expect to see a lot of data. But if you manually pull data and check the inventory with the appropriate params, it correctly shows no data. So, it would appear that somehow the inventory value in count_nu next to this site is incorrect. I'm not sure how to resolve that.

> readNWISuv("03501975", parameterCd = "00060", startDate = "", endDate = "")
data frame with 0 columns and 0 rows

> whatNWISdata(siteNumber = "03501975", service = "uv", parameterCd = "00060")
 [1] agency_cd          site_no            station_nm         site_tp_cd        
 [5] dec_lat_va         dec_long_va        coord_acy_cd       dec_coord_datum_cd
 [9] alt_va             alt_acy_va         alt_datum_cd       huc_cd            
[13] data_type_cd       parm_cd            stat_cd            ts_id             
[17] loc_web_ds         medium_grp_cd      parm_grp_cd        srs_id            
[21] access_cd          begin_date         end_date           count_nu          
<0 rows> (or 0-length row.names)

Definitely something wonky going on at that site. The data exist, but you can't get to the new gage pages for that site, and the discharge metadata look weird to me. I think it's something with the records, and I see the site has been discontinued, so that might be something to do with it. https://waterdata.usgs.gov/monitoring-location/03501975/#parameterCode=00065

But you can see the data at the old gage site (daily values) page: https://waterdata.usgs.gov/nwis/dv?referred_module=sw&site_no=03501975

Right, but I don't see any UV data on those pages, so its weird that the UV inventory returned something for count_nu.

I need to keep moving forward with this for now. So, I am going to add a single line that skips over the code in convert_to_long() if the data input has 0 rows. Here's the line I added in the function for now:

image

Oh weird: you can get dv data, but not stat code 00003, just 00001 and 00002.

readNWISdv(siteNumbers = "03501975", parameterCd = "00060", statCd = '00001', startDate = "", endDate = "")

image

Oh strange, I think we are skipping this entirely within the dv pull here then. Still so weird that my May 5 inventory claimed there were over 350k records and now there are 0. Maybe they were working on something with this site?

Yeah, maybe they're making these data available in the dv database, and the mean isn't an approved record yet. Not entirely sure, but the only reason it's being pulled by uv is when it doesn't exist in dv. Suggests to me that they are working on these records, as you suggest.

Got passed this with some code in #30 but not sure if there is some underlying issue we should investigate later, so leaving open for now :)