traitecoevo/traits.build

Add `collection_date` to variables that make `observation_id`?

Closed this issue · 2 comments

For the invert database there are a few cases where collection_date is used for repeat measurements. I was wondering whether we should add a "replicate observations" context in these situations, or whether we should end up trying to add collection_date to the pivot variables. Previously Lizzy had concerns that it might not work as a pivot variable because it is continuous and sometimes includes time. I'm imagining it would be okay if it were all character type?

@ehwenk's thoughts about adding it:

I'm trying to reason through the best answer for collection_date. In particular, how the "meaning" of collection_date changes if it is an "observation_id-distinguisher". Then it would need to be added to the list of variables that generates observation_id - because it isn't just about pivoting, but more about how it informs what is an observation. And I think that is fine - in fact better. In my ontology thought-charts, I do have collection_date as an alternative to temporal_context for informing about "time", but the way it was input, it didn't actually get used, but just "reported".
The one complication comes if there are trait-level collection_date entries. So far it is always dataset-level, although I know we talked about allowing this. If there are different collection_date columns for different traits, then they'd become separate observation_id's, but retain the same individual_id (since that would be based on the row), so I think that is fine.

More thoughts from Lizzy:

I just thought of a problem with including collection_date in observation_id!! For response curve data, there is a time column that is of course different for each point - and then those will become different observations.

I guess since this is an expert use case, I'll simply add a note for these to ensure the column mapped to collection_date is not the same as the time stamp for each measurement.

Time stamp of repeat measurements is usually ignored (not added as a context) because otherwise it would make each repeat measurement a different observation_id.

Also include original_name