calculate_haplotype_statistics.py slightly differs from block headers
Opened this issue · 0 comments
Hi @vibansal, I have a question about the calculate_haplotype_statistics.py
script. I noticed that the phased count
and num snps max blk
reported by the script are different from those in BLOCK headers of my .hap file I use. For instance, if I sum the total number of phased SNVs and check the number of SNVs in the largest block in .hap file, I get slightly different counts as compared to the script output.
If I sum the phased
field for all blocks I get the following number: 189701. My largest block header is as following:
BLOCK: offset: 12 len: 189252 phased: 188348 SPAN: 248704444 fragments 663113
However, the output from calculate_haplotype_statistics.py
gives the following numbers with -i
on:
phased count: 188484
num snps max blk: 188057
I wonder if there is some kind of filter implemented in the script that causes this?
Best,
Mikhail