eldariont/svim

merge the two lines translocation(BND) records into one line

Closed this issue · 2 comments

In the output, I found out that translocation (BND) are separated into two lines, the position1 and position2 is reversed, the svim.BND.xxx sv ID is also different. I want to merge every translocation event into one line. Is there any recommended method?
I am using multi SV callers, and found out that some callers do the same thing for translocations like manta and svaba some callers do not (only one line for each translocation) like sniffles and cuteSV.

chr1    53900   svim.BND.7      N       N[chr2:11656999[        1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.
      GT:DP:AD        ./.:.:.,.
chr2    11656999        svim.BND.156311 N       ]chr1:53900]N   1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.

Hi,

yes, SVIM separates translocations into two lines in order to follow the specification of the Variant Calling Format (see Section 5.4 on page 17).

If you need only one line per translocation you could use a simple Python script similar to the one below.

Cheers
David

import sys
from cyvcf2 import VCF, Writer

vcf_file = VCF(sys.argv[1])
out_file = Writer(sys.argv[2], vcf_file)
for variant in vcf_file:
	if variant.INFO["SVTYPE"] == "BND":
		from_chrom = variant.CHROM
		from_pos = variant.POS
		alt_string = variant.ALT[0]
		#fwd direction at pos1
		if alt_string[0] == "N":
			pos_fields = alt_string[2:-1].split(":")
			assert len(pos_fields) == 2
			to_chrom = pos_fields[0]
			to_pos = int(pos_fields[1])
		#rev direction at pos1
		else:
			pos_fields = alt_string[1:-2].split(":")
			assert len(pos_fields) == 2
			to_chrom = pos_fields[0]
			to_pos = int(pos_fields[1])
		if from_chrom < to_chrom:
			out_file.write_record(variant)
		elif from_chrom == to_chrom:
			if int(from_pos) < int(to_pos):
				out_file.write_record(variant)
vcf_file.close()
out_file.close()

Thanks you very much! I will try it.