location_id should be NA's for all `entity_type: species`
ehwenk opened this issue · 4 comments
As I'm writing up tutorials for traits.build, I've discovered that there are observations for which entity_type: species
that have a location_id
.
They, correctly, do not have a population_id
, but somehow are still assigned a location_id
, presumably because they are read in from a row for which there is location information (for population-level or individual-level measurements).
We should:
- Actively set
location_id
to NA ifentity_type: species
- after all identifiers are created. - Add a test to confirm that, within a dataset,
location_id
always is NA for species-level measurements.
I've fixed this here. It now means that species-level measurements will have their location_id
removed and hence any location-level metadata will not be assigned to them. Just checking that this is the intended workflow? @ehwenk
Test_2023_4 is an example of this. Lines 214-215 have two species measurements with repeat_measurements_id: TRUE
. Previously the "australia" location_name
for the first measurement would assign entity_type
population (assigned at location-level metadata) and then it would become a separate observation_id
to Line 215. Now it's part of the same observation_id
and location is completely ignored.
Yes, that is the intended workflow - if there is location-specific information, then it isn't actually a species-level observation, but instead a population-level observation, because the trait measurement(s) only refers to individuals in that location (a population), not to all individuals of the species. Changing this might uncover some errors - where we've declared entity_type = species
when it should be entity_type = population
. So we need to check that all studies still pivot.
@ehwenk But previously if we declared entity_type = species
there would be no population_id
(as you said in your originally comment) even if there is a location_id
, so it wouldn't affect whether studies still pivot, would it?
@yangsophieee Yes you're right - I don't think I was clear. If a value is declared to have entity_type = species
but should be entity_type = population
(i.e. there are two values for the same trait x species), then the pivot-test will fail and it will be obvious that entity_type should be population.