Admix is a simple tool to calculate ancestry composition (admixture proportions) from SNP raw data provided by various DNA testing vendors (such as 23andme and AncestryDNA).
You can use pip
to install Admix directly from this Github repository:
pip install git+https://github.com/stevenliuyi/admix
You can also install Admix from PyPI:
pip install admix
Note that due to the size limit, the package on PyPI only contains five models (K7b
, K12b
, globe13
, world9
and E11
). If you want all models, you could download them or just install Admix from this repository as shown above.
Suppose that you've already had your 23andme raw data downloaded and placed in the current directory with the name my_raw_data.txt
. Then you can perform admixture calculation by specifying the calculation model (K7b
in this example):
admix -f my_raw_data.txt -v 23andme -m K7b
You can also set multiple models for calculation:
admix -f my_raw_data.txt -v 23andme -m K7b K12b
If no models are set, the program will apply all the available models:
admix -f my_raw_data.txt -v 23andme
You can choose the raw data format by changing the -v
or --vendor
parameter. The values supported are listed here.
You may also set the -o
or --output
parameter to write the ancestry composition results into a file:
admix -f my_raw_data.txt -v 23andme -o result.txt
If you don't have your raw data yet, you can also test the program by using a demo 23andme data file provided by the program:
admix -m world9
Chinese users may turn on the -z
flag so the population would be displayed in Chinese:
admix -z -m E11
Besides, you may use --sort
flag to sort the proportions and --ignore-zeros
flag to display non-zero proportions only.
For more help information, you could use:
admix -h
- English
Command: admix -m K12b
Output:
Gedrosia: 0.06%
Siberian: 3.71%
Northwest African: 0.00%
Southeast Asian: 33.43%
Atlantic Med: 0.07%
North European: 0.00%
South Asian: 0.00%
East African: 0.00%
Southwest Asian: 0.01%
East Asian: 62.72%
Caucasus: 0.00%
Sub Saharan: 0.00%
- Chinese
Command: admix -m K12b -z
Output:
格德罗西亚: 0.06%
西伯利亚: 3.71%
西北非: 0.00%
东南亚: 33.43%
大西洋地中海: 0.07%
北欧: 0.00%
南亚: 0.00%
东非: 0.00%
西南亚: 0.01%
东亚: 62.72%
高加索: 0.00%
撒哈拉以南非洲: 0.00%
Admix supports raw data formats from the following DNA testing vendors with -v
or --vendor
parameter:
parameter value | vendor |
---|---|
23andme | 23andme |
ancestry | AncestryDNA |
ftdna | FamilyTreeDNA Family Finder |
ftdna2 | FamilyTreeDNA Family Finder (new format) |
wegene | WeGene |
myheritage | MyHeritageDNA |
Admix supports many publicly available admixture models. All the calculator files are properties of their authors, and are not covered by the license of this program. Links are provided which contain more information for each model.
model value | model name | source |
---|---|---|
K7b |
Dodecad K7b | Link |
K12b |
Dodecad K12b | Link |
globe13 |
Dodecad globe13 | Link |
goble10 |
Dodecad globe10 | Link |
world9 |
Dodecad world9 | Link |
Eurasia7 |
Dodecad Eurasia7 | Link |
Africa9 |
Dodecad Africa9 | Link |
weac2 |
Dodecad weac (West Eurasian cline) 2 | Link |
E11 |
E11 | Link |
K36 |
Eurogenes K36 | Link |
EUtest13 |
Eurogenes EUtest K13 | Link |
Jtest14 |
Eurogenes Jtest K14 | Link |
HarappaWorld |
HarappaWorld | Link |
TurkicK11 |
Turkic K11 | Link |
KurdishK10 |
Kurdish K10 | Link |
AncientNearEast13 |
Ancient Near East K13 | Link |
K7AMI |
Eurogenes K7 AMI | Link |
K8AMI |
Eurogenes K8 AMI | Link |
MDLPK27 |
MDLP K27 | Link |
puntDNAL |
puntDNAL K12 Ancient World | Link |
K47 |
LM Genetics K47 | Link |
K7M1 |
Tolan K7M1 | Link |
K13M2 |
Tolan K13M2 | Link |
K14M1 |
Tolan K14M1 | Link |
K18M4 |
Tolan K18M4 | Link |
K25R1 |
Tolan K25R1 | Link |
MichalK25 |
Michal World K25 | Link |
Maximum likelihood estimation (MLE) algorithm is applied for ancestry composition calculation, and the implementation is fairly straightforward.
Let Fnk be the minor allele frequency of SNP marker n for population k, lminorn and lmajorn be the minor and major allele for marker n respectively, and Gni be the allele at marker n of the individual we're interested in (i=1,2). Our goal is to find the admixture fraction qk of the individual, which maximize the log likelihood function
where χ is the indicator function, J and j are the all-ones matrix/vector. Note that the Einstein summation convention is implied here. With the constraints 0 ≤ qk ≤ 1 and Σ qk = 1, we can obtain the admixture proportions qk by applying optimization techniques.