USEPA/EPATADA

Create TADA.MonitoringLocationIdentifier in TADA_AutoClean

cristinamullin opened this issue · 4 comments

Is your feature request related to a problem? Please describe.

TADA.MonitoringLocationIdentifier is needed to group nearby sites. This column should be referenced in all TADA functions instead of the original MonitoringLocationIdentifier.

Describe the solution you'd like

Create TADA.MonitoringLocationIdentifier as an exact copy of the original MonitoringLocationIdentifier as part of TADA_AutoClean.

Additional context

Related issue: #475

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

  • Create the function/code.

  • Document all code using comments to describe what is does.

  • Create tests in tests folder.

  • Create help file using roxygen2 above code.

  • Create working examples in help file (via roxygen2).

  • Add to appropriate vignette (or create new one).

Should we also be thinking about adding a TADA.MonitoringLocationName column? I haven't run into this yet, but I am wondering if having two (or more) different MonitoringLocationNames for the same TADA.MonitoringLocationIdentifier (when sites have been grouped and given a concatenated identifier) will cause any problems with labeling figures.

Yes, I was ruminating about this over the weekend. How should we handle MonitoringLocationName MonitoringLocationTypeName, LatitudeMeasure, LongitudeMeasure (we already have a TADA version of lat/lon but which do we use for maps once the sites are grouped?) and any other site metadata that TADA will rely on later in the workflow for figures/analyses (Mod 3?) ... ? It probably does make sense to have a TADA version of all of these that would be updated as part of the nearby sites function.

It would be helpful to look at some examples with our current figure/map functions & think through how best to handle this.

In the case where metadata for both sites are the same that is easy, we could copy it to the TADA version of those columns. I am not sure what we'll want to do if they are different: 1) user chooses which one to keep? 2) concatenate, 3) include a flag that notifies user if they are different?

For differing metadata, one idea I had was to use an org hierarchy like in TADA_FindPotentialDuplicatesMultipleOrgs. This would allow users to prioritize retaining metadata from their own org in cases where the nearby site paired was from a different org. In the case of nearby sites from the same org, the default selection could be made based on ActivityStartDateTime (default to either oldest or most recent), number of results (metadata associated with most results would be selected), or metadata could be selected randomly.

I think a flag would be useful. If we did use some default way of selecting the metadata, the flag could both notify the user they were different and explain which selection had been made.

Currently, if a TADA df has already had TADA_FindNearbySites run (and site groupings were found) and you try to run it again, there will be an error because there are duplicate TADA.MonitoringLocationIdentifiers after you have filtered for unique combinations of TADA.MonitoringLocationIdentifier, TADA.LatititudeMeasure, and TADA.LongitudeMeasure. I think that standardizing the metadata will resolve this issue as well.