fgvieira/ngsF

ngsF leads to unexpected inbreeding estimate of zero

larahurban opened this issue · 5 comments

Thank you very much for providing this package! I just applied it to an inbred species using the parameters that are recommended for low-coverage data ("--init_values r -min_epsilon 1e-9") since my individuals show quite a spread of coverage (between ~ 3 and 20). For a few individuals, I receive an estimated inbreeding coefficient of zero, which is not expected according to what we know about the species. May I ask if you have an idea about why this might be the case? Thank you very much for your help!

Hi,

and thank you for your interest in ngsF!

How many samples do you have?
Are they all from the same population?
Are these individuals with F=0 the ones with lower coverage?

Thank you very much for your reply, @fgvieira.

I have 84 samples and they are all expected to be from the same population (the species in question only has ~500 individuals left, so all individuals are expected to be related).

I checked and it is unfortunately not the case that the F=0 individuals are the ones with low coverage; indeed, it rather seems to be the other way round - all high-coverage samples have F=0 (I am just calculating the standard deviation across the genome to have some more detailed information). Altogether, 21 out of the 84 individuals have F=0.

I did some more research and the only two factors that I found to potentially lead to issues are: 1) I did not filter for LD or linked variants (do you by any chance know of a tool that would do this based on genotype likelihoods?) and 2) I have not removed the variants that deviate from HW yet.

Thank you very much for your help - any hint would be highly appreciated.

This is a bit tricky, since ngsF model assumes independence of individuals (unrelated) and of sites (unlinked). I know this is not ideal (since some populations' inbreeding is due to crosses between related individuals), but it is usually not a problem since individuals are usually not that related to bias the estimates. However, this might not hold in your case, if the species only has 500 individuals. I also had some plans to extend the model to account for related individuals, never really got the time for it.

As for unlinked sites, you can try to use ngsLD to prune linked sites and see if it helps.

However, I'd not remove variants that deviate from HW, since those are the most informative ones when calculating inbreeding.

Thank you very much @fgvieira! I will give it one more try and see if pruning of linked sites leads to more meaningful results - if not it might indeed be that, as you say, my individuals are too closely related for your approach. Thank you very much again!

For future users who might encounter a similar problem: Using the ngsF.sh script within ngsF will run 20 iterations to avoid reaching a local maximum, and together with the LD pruning it allowed me to get non-zero inbreeding coefficients (as expected). Thanks again for your help @fgvieira!