merge the two lines translocation(BND) records into one line
Closed this issue · 2 comments
In the output, I found out that translocation (BND) are separated into two lines, the position1
and position2
is reversed, the svim.BND.xxx
sv ID is also different. I want to merge every translocation event into one line. Is there any recommended method?
I am using multi SV callers, and found out that some callers do the same thing for translocations like manta
and svaba
some callers do not (only one line for each translocation) like sniffles
and cuteSV
.
chr1 53900 svim.BND.7 N N[chr2:11656999[ 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.
GT:DP:AD ./.:.:.,.
chr2 11656999 svim.BND.156311 N ]chr1:53900]N 1 PASS SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=. GT:DP:AD ./.:.:.,.
Hi,
yes, SVIM separates translocations into two lines in order to follow the specification of the Variant Calling Format (see Section 5.4 on page 17).
If you need only one line per translocation you could use a simple Python script similar to the one below.
Cheers
David
import sys
from cyvcf2 import VCF, Writer
vcf_file = VCF(sys.argv[1])
out_file = Writer(sys.argv[2], vcf_file)
for variant in vcf_file:
if variant.INFO["SVTYPE"] == "BND":
from_chrom = variant.CHROM
from_pos = variant.POS
alt_string = variant.ALT[0]
#fwd direction at pos1
if alt_string[0] == "N":
pos_fields = alt_string[2:-1].split(":")
assert len(pos_fields) == 2
to_chrom = pos_fields[0]
to_pos = int(pos_fields[1])
#rev direction at pos1
else:
pos_fields = alt_string[1:-2].split(":")
assert len(pos_fields) == 2
to_chrom = pos_fields[0]
to_pos = int(pos_fields[1])
if from_chrom < to_chrom:
out_file.write_record(variant)
elif from_chrom == to_chrom:
if int(from_pos) < int(to_pos):
out_file.write_record(variant)
vcf_file.close()
out_file.close()
Thanks you very much! I will try it.