gadget-framework/mfdb

Survey indices with cv's

Opened this issue · 4 comments

bthe commented

When importing survey indices you would like to add information on uncertainty along with the survey index, e.g when an index value is supplied one would like also to have the cv (i.e. std. dev/mean). What is the best way of adding that information to mfdb? Should I import the CV as an additional time-series?

bthe commented

I think is fairly obvious I've never used this data type :) But continuing with the question, I assume one would deal with indices of biomass by species in a similar fashion?

Should I import the CV as an additional time-series?

Yes, but having more "first-class" support for CV with an extra column to store it / support in the querying functions would make sense I think.

indices of biomass by species

These could be stored in sample.weight with count being NA / NULL. Then you have species and all the other metadata fields handy.

bthe commented

Yes, but having more "first-class" support for CV with an extra column to store it / support in the querying functions would make sense I think.

Sounds good.

These could be stored in sample.weight with count being NA / NULL. Then you have species and all the other metadata fields handy.

Yes this is exactly what I have leaned towards in the past. To give you a bit of background for this question, I'm thinking about abundance estimates that arise from sighting surveys, that are only available by division and species. Other attributes are not available. So a typical dataset looks like:

 year division  count   cv species
 2005 WG   10792 0.59  MIW    
 2007 WG   9853 0.43  MIW    
 2015 WG    5241 0.49  MIW    
 2007 WC   20741 0.3   MIW    
 2007 CIP  1350 0.38  MIW    
...

I've been picking an areacell at random from the division and assigning the abundance estimates to that. I can sort of squeeze this information into both table but in both cases you will need be careful when querying the data.

I think this is why we made count NULLable in the first place. Which options makes more sense I'm not sure

One of the reasons that survey index exists is so it could be applied as an abundance scaling factor to other queries, in which case you choose them by the name you gave them. I'm not sure if that join makes sense if we add a species column in as well.

I've been picking an areacell at random from the division and assigning the abundance estimates to that.

I think I'd add an areacell with the same name as the division to the division (if that makes sense).