gbradburd/conStruct

Using multiallelic data

Closed this issue · 3 comments

Dear Dr. Bradburd,

Your manuscript states that conStruct can be used with biallelic SNPs. I have a dataset with 1000 multiallelic (range: 2-20) microhaplotypes. Is it possible to use this type of data in your software?
Thank you,
Matt Hopken

Hi Matt,

Sorry for the slow reply. Yes, you can potentially use this dataset in conStruct. There are 2 main ways to "SNP-ify" microsat data:

  1. Treat each allele at a locus as a separate bi-allelic locus. E.g., if you have 6 microsat alleles at a locus, you could treat those as 6 different bi-allelic loci, in which case the allele frequency at a "locus" in a sample would be the number of times that allele is observed in that sample out of the total number of chromosomes genotyped in that sample. This procedure is suggested by Patterson et al (2006), and, as they note, based on a procedure from Cavalli-Sforza.

  2. Lump microsat alleles at a single locus into "major" and "other," where the frequency of all non-major alleles could be summed into the "other" category.

In choosing between options 1&2, I'd suggest calculating Fst or some other differentiation measure using the original microsat data, and comparing it to the same measure of differentiation for each of the altered datasets produced using options 1&2. Whichever looks best is probably your best bet.

Hope that helps!

Hi Matt - just doing some New Year book-keeping. Is this issue still open for you?