luntergroup/bamsplit

Supported VCF formats

mehrankr opened this issue · 2 comments

Thanks a lot for developing this very useful tool.
I know this is probably not ready for public use but was just wondering what input vcf formats it supports?

I tried running this on VCF files generated by GATK ReadBackedPhasing and it generates empty bam files for support_0 and support_1.

This script would be of interest to a lot of people so it would really help if you could add a quick start guide. Although even without that it was quite easy to run.

Thanks a lot

Thanks for your question. I've only tested this script on output from Octopus. Actually, I wrote this script as a prototype for the BAM realignment feature in Octopus, which is now fully functional, and has a number of advantages over this script (reads are realigned, supporting haplotypes are indicated with BAM tags). I'm not planning on developing this script any further at this point, but adapting it to support VCF output from other variant callers shouldn't be too difficult - you'd likely need to modify the way PS is used to define phase blocks as currently they are assumed to refer to a previous records POS (as in Octopus), rather then a unique string identifier (as in GATK).

Hi @dancooke :
Thanks for this nice tool. I got same issue as @mehrankr . My phased .vcf is VCFv4.2. I wonder could you please list example of Octopus VCF output, and do you think it is working if I manually change the VCF v4.2 file to Octopus VCF format? Following is two sites of my .vcf.

##FORMAT=<ID=PS,Number=1,Type=Integer,Description="ID of Phase Set for Variant">
##FORMAT=<ID=PQ,Number=1,Type=Integer,Description="Phred QV indicating probability that this variant is incorrectly phased relative to the haplotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  cyr34
chr6    4935    .       G       A       1236.77 .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.624;ClippingRankSum=0.000;DP=56;ExcessHet=3.0103;FS=2.302;MLEAC=1;MLEAF=0.500;MQ=56.83;MQRankSum=-3.348;QD=22.09;ReadPosRankSum=-2.014;SOR=0.402     GT:AD:DP:GQ:PS:PQ:PD    0|1:24,32:56:99:4935:100:74
chr6    6330    .       A       G       1240.77 .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.802;ClippingRankSum=0.000;DP=67;ExcessHet=3.0103;FS=4.630;MLEAC=1;MLEAF=0.500;MQ=47.06;MQRankSum=-5.509;QD=18.52;ReadPosRankSum=0.363;SOR=1.484      GT:AD:DP:GQ:PS:PQ:PD    1|0:13,54:67:99:4935:100:100

Thanks.
Chongjing