gbradburd/conStruct

Error: greater number of loci than there are samples

Closed this issue · 2 comments

Hi Gideon & others,

I'm working with a minimally invasive GT-seq SNP dataset. I had to filter my genotypes quite strictly to get rid of the positive definite sample covariance error, so now I'm down to 117 loci and 461 samples. When I tried to run conStruct with that dataset, I got the error that the data must have a greater number of loci than there are samples. Is this true?

I did subset my data by collection year and tried again with 194 samples and 117 loci, and got the same error. I don't want to subset it much further, and I'd like to have this done at the individual level (not grouped by site).

Thank you!

Molly

Hi Molly,

Yes, it is a requirement of the model that there be more loci than samples (you can't calculate the likelihood of the data if that condition isn't met). I've often found that, by dropping the individuals with the most missing data, I can keep more loci in the total dataset (i.e., in your quest to make sure your data can be analyzed, you can try dropping individuals instead of loci). Have you tried that?

Thanks for the quick response and the information! I could try to drop specific individuals, but from the start, I have more individuals than loci (without any filtering) - just the nature of the dataset. I'll think about if subsetting individuals in a specific way would be informative in anyway...