Replace `vcfbreakmulti` with ngs-bits VcfBreakMulti
Opened this issue · 1 comments
Vcflib vcfbreakmulti
doesn't handle annotations correctly during splitting
- create megSAP test for this error (e.g. calling on the extracted region of DNA2106177 shown below)
- test if VcfBreakMulti from ngs-bits works properly (also test on whole WGS sample, e.g. NA12878_45)
- benchmark time (on whole WGS sample, e.g. NA12878_45)
- implement phased genotype consideration
- make tests for different phased genotype constellations
- replace vcflib tool by ngs-bits tool in megSAP
- merge with master
Example:
tabix -h /mnt/storage2/projects/diagnostic/Genome_Diagnostik/Sample_DNA2106177A1_02/dragen_variant_calls/DNA2106177A1_02_dragen.vcf.gz chr1:4772080-4772085 | /mnt/storage2/megSAP/tools/vcflib-1.0.3/build/vcfbreakmulti | VcfCheck
vcfbreakmulti (vcflib): 3min 28sec on NA12878_45_var_annotated.vcf
VcfBreakMulti (ngs-bits): 20sec on NA12878_45_var_annotated.vcf
VcfCheck output (vcflib):
WARNING: First base of insertion/deletion not matching - ref: 'T' alt: 'GC'! - in line 3696:
chr1 2331965 . T GC 258 low_conf_region ABP=44;CSQ=GC|intron_variant|ENST00000378531.8|Transcript|||protein_coding|,GC|regulatory_region_variant|ENSR00001164926|RegulatoryFeature|||TF_binding_site|;CSQ2=GC|intron_variant|MODIFIER|MORN1|HGNC:25852|ENST00000378531.8|Transcript||12/13|c.1250+4504delinsGC|;MES_SWA=0.0&0.3&-1.1&0.0&0.0&-17.6&ENST00000378531;MQM=60;NGSD_COUNTS=0,715,0;NGSD_GENE_INFO=MORN1%20(inh%3Dn/a%20oe_syn%3D1.00%20oe_mis%3D1.05%20oe_lof%3D0.77);NGSD_GROUP=0,121;SAF=7;SAP=6;SAR=12;SpliceAI=GC|MORN1|0.00|0.00|0.00|0.00|-46|-14|8|42 GT:DP:AO:GQ 0/1:76:19:142
VcfCheck output (ngs-bits):
WARNING: First base of insertion/deletion not matching - ref: 'T' alt: 'GC'! - in line 3696:
chr1 2331965 . T GC 258 low_conf_region MQM=60;SAP=6;SAR=12;SAF=7;ABP=44;CSQ=GC|intron_variant|ENST00000378531.8|Transcript|||protein_coding|,GC|regulatory_region_variant|ENSR00001164926|RegulatoryFeature|||TF_binding_site|;CSQ2=GC|intron_variant|MODIFIER|MORN1|HGNC:25852|ENST00000378531.8|Transcript||12/13|c.1250+4504delinsGC|;MES_SWA=0.0&0.3&-1.1&0.0&0.0&-17.6&ENST00000378531;SpliceAI=GC|MORN1|0.00|0.00|0.00|0.00|-46|-14|8|42;NGSD_COUNTS=0,715,0;NGSD_GROUP=0,121;NGSD_GENE_INFO=MORN1%20(inh%3Dn/a%20oe_syn%3D1.00%20oe_mis%3D1.05%20oe_lof%3D0.77) GT:DP:AO:GQ 0/1:76:19:142
EDIT: After re-analysis of NA12878_45 no VcfCheck Warnings, neither for ngs-bits VcfBreakMulti result nor for vcflibs vcfbreakmulti result.
When calling on the extracted region of DNA2106177:
- ngs-bits VcfBreakMulti results in a flawless VcfCheck
- vcflibs vcfbreakmulti results in 6 WARNINGs after VcfCheck ( VcfCheck_DNA2106177_region_vcflib_out.txt )
e.g.:
WARNING: For sample 'DNA2106177A1_02 / annotation 'AD' (number=R), the number of values is 3, but should be 2! - in line 2619:
chr1 4772083 . ATTTTT A 388.8 PASS AC=1;AF=0.500;AN=2;DP=46;FS=0.000;FractionInformativeReads=0.911;MQ=250.00;MQRankSum=0.000;QD=8.45;ReadPosRankSum=0.000;SOR=1.075 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB ./1:0,24,17:0.5854:41:0,9,10:0,15,7:44:411,350,54,1048,0,58:3.8880e+02,3.3880e+02,4.5857e+01,1.0373e+03,1.5617e-04,5.0000e+01:0.00,11.00,14.01,11.00,22.00,14.01:0,0,24,17:0,0,28,13