MazamaScience/RAWSmet

harmonize wrcc and fw13 metadata creation

Closed this issue · 0 comments

In the early stages of a project like this, we don't know how much we can force data from multiple sources into the same data mode. But we're far enough along that we can now do this.

I created metadata for Oregon from the FW13 site and from WRCC. Here is what I see

> names(or_fw13)
[1] "nwsID"       "longitude"   "latitude"    "elevation"   "siteName"    "countryCode"
[7] "stateCode"   "timezone"   
> names(or_wrcc)
 [1] "countryCode" "stateCode"   "stationID"   "siteName"    "longitude"  
 [6] "latitude"    "elevation"   "nessID"      "nwsID"       "agency"     
[11] "timezone" 

Now is a good time to bring these two into harmony with the following steps:

  • rename wrcc meta stationID to wrccID throughout the entire codebase
  • add columns with properly typed missing values (e.g. as.character(NA))to the fw13 metadata: nessID, wrccID, agency (Note that anything with "ID" will always be character and never numeric. James Bond is "007", not 7.)
  • Reorder columns for both kinds of metadata in identifiers-location-time-metadata order: nwsID, wrccID, nessID, siteName, longitude, latitude, elevation, countryCode, stateCode, agency
  • See if there is a reliable way to remove the state from the wrcc siteName.

Then we will have completely identical data frame structures (with lots of NAs) regardless of where we get the data from.

This kind of guarantee will greatly reduce the amount of testing and error handling code we have to write downstream.