Add `collection_date` to variables that make `observation_id`?
Closed this issue · 2 comments
For the invert database there are a few cases where collection_date
is used for repeat measurements. I was wondering whether we should add a "replicate observations" context in these situations, or whether we should end up trying to add collection_date
to the pivot variables. Previously Lizzy had concerns that it might not work as a pivot variable because it is continuous and sometimes includes time. I'm imagining it would be okay if it were all character type?
@ehwenk's thoughts about adding it:
I'm trying to reason through the best answer for
collection_date.
In particular, how the "meaning" ofcollection_date
changes if it is an "observation_id-distinguisher". Then it would need to be added to the list of variables that generatesobservation_id
- because it isn't just about pivoting, but more about how it informs what is an observation. And I think that is fine - in fact better. In my ontology thought-charts, I do havecollection_date
as an alternative totemporal_context
for informing about "time", but the way it was input, it didn't actually get used, but just "reported".
The one complication comes if there are trait-levelcollection_date
entries. So far it is always dataset-level, although I know we talked about allowing this. If there are differentcollection_date
columns for different traits, then they'd become separateobservation_id
's, but retain the sameindividual_id
(since that would be based on the row), so I think that is fine.
More thoughts from Lizzy:
I just thought of a problem with including collection_date in observation_id!! For response curve data, there is a time column that is of course different for each point - and then those will become different observations.
I guess since this is an expert use case, I'll simply add a note for these to ensure the column mapped to collection_date is not the same as the time stamp for each measurement.
Time stamp of repeat measurements is usually ignored (not added as a context) because otherwise it would make each repeat measurement a different observation_id
.
Also include original_name