traitecoevo/traits.build

location_id should be NA's for all `entity_type: species`

ehwenk opened this issue · 4 comments

As I'm writing up tutorials for traits.build, I've discovered that there are observations for which entity_type: species that have a location_id.

They, correctly, do not have a population_id, but somehow are still assigned a location_id, presumably because they are read in from a row for which there is location information (for population-level or individual-level measurements).

We should:

  1. Actively set location_id to NA if entity_type: species - after all identifiers are created.
  2. Add a test to confirm that, within a dataset, location_id always is NA for species-level measurements.

I've fixed this here. It now means that species-level measurements will have their location_id removed and hence any location-level metadata will not be assigned to them. Just checking that this is the intended workflow? @ehwenk

Test_2023_4 is an example of this. Lines 214-215 have two species measurements with repeat_measurements_id: TRUE. Previously the "australia" location_name for the first measurement would assign entity_type population (assigned at location-level metadata) and then it would become a separate observation_id to Line 215. Now it's part of the same observation_id and location is completely ignored.

ehwenk commented

Yes, that is the intended workflow - if there is location-specific information, then it isn't actually a species-level observation, but instead a population-level observation, because the trait measurement(s) only refers to individuals in that location (a population), not to all individuals of the species. Changing this might uncover some errors - where we've declared entity_type = species when it should be entity_type = population. So we need to check that all studies still pivot.

@ehwenk But previously if we declared entity_type = species there would be no population_id (as you said in your originally comment) even if there is a location_id, so it wouldn't affect whether studies still pivot, would it?

ehwenk commented

@yangsophieee Yes you're right - I don't think I was clear. If a value is declared to have entity_type = species but should be entity_type = population (i.e. there are two values for the same trait x species), then the pivot-test will fail and it will be obvious that entity_type should be population.