/phaseless

Imputation and Admixture for lcWGS in one goal

Primary LanguageC++GNU General Public License v3.0GPL-3.0

Imputation and Admixture for lcWGS in one goal

https://github.com/Zilong-Li/phaseless/actions/workflows/linux.yml/badge.svg https://github.com/Zilong-Li/phaseless/actions/workflows/mac.yml/badge.svg

Phaseless is designed for genotype imputation and admixture inference using low coverage sequencing data. Firstly, the imputation model is in the spirit of fastPHASE model but with genotype likelihood as input, and likewise STITCH works on raw reads. Next, the admixture inference is modeled on the haplotype cluster information from the fastphase model.

Table of Content

Build

git clone https://github.com/Zilong-Li/phaseless
make -j6

Usage

phaseless owns subcommands. please use phaseless -h to check it out.

Imputation

The parallelism of phaseless impute is designed for impute the whole genome at once, which means it run multiple chunks in parallel with each taken over by a thread. Check out the --chunksize option.

phaseless impute -g data/bgl.gz -c 10 -n 4 -s 100000

However, one might only be interested in imputing a single chunk for whatever reason. To change the behavior of parallelism and make it running in parallel for single chunk, we can use --single-chunk option to toggle the behavior.

phaseless impute -g data/bgl.gz -c 10 -n 4 -S

Admixture

With the binary file outputted by the above impute command, we can run admixture inference for different k ancestry.

phaseless admix -b impute.pars.bin -k 3 -n 4

Parameters

Besides, we can investigate and manipulate the parameters from fastPHASE model using the binary file outputted by impute command.

phaseless parse -b impute.pars.bin -c 0 ## single chunk, all samples
phaseless parse -b impute.pars.bin -c -1 -s samples.txt ## all chunks, specifc samples

Plotting

Now, we can do some interesting plotting.

./misc/plot_haplotype_cluster.R

misc/hapfreq.png

Output

Without specifying the output prefix -o, the output filenames of the above commands are as follows:

❯ tree -L 1
.
├── admix.Q
├── admix.log
├── parse.haplike.bin
├── parse.log
├── impute.recomb
├── impute.pi
├── impute.vcf.gz
├── impute.pars.bin
└── impute.log

Changes

check out the news file.