harmonize wrcc and fw13 metadata creation
jonathancallahan opened this issue · 0 comments
jonathancallahan commented
In the early stages of a project like this, we don't know how much we can force data from multiple sources into the same data mode. But we're far enough along that we can now do this.
I created metadata for Oregon from the FW13 site and from WRCC. Here is what I see
> names(or_fw13)
[1] "nwsID" "longitude" "latitude" "elevation" "siteName" "countryCode"
[7] "stateCode" "timezone"
> names(or_wrcc)
[1] "countryCode" "stateCode" "stationID" "siteName" "longitude"
[6] "latitude" "elevation" "nessID" "nwsID" "agency"
[11] "timezone"
Now is a good time to bring these two into harmony with the following steps:
- rename wrcc meta
stationID
towrccID
throughout the entire codebase - add columns with properly typed missing values (e.g.
as.character(NA)
)to the fw13 metadata:nessID
,wrccID
,agency
(Note that anything with "ID" will always be character and never numeric. James Bond is "007", not 7.) - Reorder columns for both kinds of metadata in identifiers-location-time-metadata order:
nwsID, wrccID, nessID, siteName, longitude, latitude, elevation, countryCode, stateCode, agency
- See if there is a reliable way to remove the state from the wrcc
siteName
.
Then we will have completely identical data frame structures (with lots of NA
s) regardless of where we get the data from.
This kind of guarantee will greatly reduce the amount of testing and error handling code we have to write downstream.