fgvieira/ngsLD

missing GL/GP data

cbird808 opened this issue · 2 comments

I'm wondering if there's a way to designate a set of genotype likelihoods or probabilities as missing for and individual x locus combination? Or will ngsLD identify GL or GP of 0.33, 0.33, 0.33 as missing and handle it appropriately?

My concern is that angsd is representing missing data as 0.33, 0.33, 0.33, and more reasonable estimates of true probabilities based upon the maf and hw are not being calculated, This will result in the 01 and 11 genotypes being over represented and the 00 genotype being underrepresented in most cases given that small maf are more common than large.

ngsLD assumes missing data if all 3 GL are equal (up to a certain precision).
If you are worried about missing data biasing your data, you can use --ignore_miss_data and missing data will ignored on all calculations.