- Python 2.6 or 2.7
- Modules: numpy, scipy, argtools
A tool for estimating sequences of CNV alleles from multiple individuals. The allele ratio of each sample is also inferred.
python cnvalloc estimate_alleles -v -K 4 examples/hist.txt
The result of estimation is emitted as JSON format.
Check performance for the number of haplotypes K = 1..10
$ parallel -k 'python cnvalloc estimate_alleles -K {} examples/hist.txt | python cnvalloc evaluate_alleles -r /dev/stdin -a examples/haps.txt' ::: {1..10}
-
examples/haps.txt : True haplotypes for evaluation
- column 1: Sample id
- column 2: Allele id
- column n (n>2): The base of the allele at the n-2 th variant site
-
examples/hist.txt : Observed bases of sequence data at variant sites
- column 1: Sample id
- column 2: One of the 'ATCG' bases
- column n(n>2) : The number of observed bases at n-2 th variant site
-
Make pileup histograms from BAM files:
$ python cnvalloc bam2hist {BAM file n} -r chr1:10000000-100010000 > pileups.n.txt
-
Import the pileup files to a database such as sqlite3
-
Select the variable sites to use with some criteria (e.g. minor_count >= 15 for any of the samples)
-
Create an input file for the
cnvalloc estimate_alleles
by querying to the database
- Consider variant types other than mutations
- Write tools for step 2-4 of the above workflow
T. Mimori et al, 2015 BMC Bioinformatics "Estimating copy numbers of alleles from population-scale high-throughput sequencing data"
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-16-S1-S4