Confusion about output
Closed this issue ยท 2 comments
Hello :)
I have been trying out SVIM for the past days to call SV between a reference genome and 1 sample. I was curious to find out about overlap of structural variants, especially deletions and duplications, and gene models on the genome. This made me realize that sometimes multiple deletion calls overlap each other. Here's an example of 4 deletions which spanned a gene and have overlap among each other:
Tlei_chr8 34630820 svim.DEL.34124 N <DEL> 18 PASS SVTYPE=DEL;END=34633647;SVLEN=-2827;SUPPORT=15;STD_SPAN=3.25;STD_POS=123.58 GT:DP:AD 1/1:15:0,15
Tlei_chr8 34604070 svim.DEL.34097 N <DEL> 27 hom_ref SVTYPE=DEL;END=34633581;SVLEN=-29511;SUPPORT=22;STD_SPAN=2.74;STD_POS=86.44 GT:DP:AD 0/0:340:318,22
Tlei_chr8 34604908 svim.DEL.34100 N <DEL> 16 hom_ref SVTYPE=DEL;END=34634412;SVLEN=-29504;SUPPORT=13;STD_SPAN=1.21;STD_POS=99.86 GT:DP:AD 0/0:311:298,13
Tlei_chr8 34622307 svim.DEL.34120 N <DEL> 13 PASS SVTYPE=DEL;END=34651819;SVLEN=-29512;SUPPORT=11;STD_SPAN=2.54;STD_POS=106.23 GT:DP:AD 0/1:20:9,11
I can see that two out of 4 deletions are homozygous (0/0), but then I am very confused how they were called at a relatively high score, 16 & 27. This makes it unclear to me how I should filter the raw dataset and what I should do with variants that appear to not actually be present, or where there is little support (22 reads supporting the variant vs 318 supporting the reference allele). In case deletions overlap also after filtering, should these then be merged, or should I assume that only one of the called deletions is accurate?
I am quite new to calling SV, so please let me know if I misunderstood anything. :)
Best,
Clara
Thank you!
Hi Clara,
thanks for reporting this issue. These are all very good questions and I will try to answer them.
First of all, opening your alignment file with a genome browser like IGV might help you a lot in understanding the results that SVIM produced. Every time I get a curious result, I have a look at the alignments in the region. In your case, I would suspect that there are many alignments contradicting each other. There seem to be 15, 22, 13 and 11 reads supporting each deletion, respectively, and some of these alignments might be incorrect or just due to ambiguous alignment (multiple different alignments of the same read yielding a similar alignment score). Unless there is a bug (which can never be ruled out ๐), SVIM only detects what is in the input alignments and all mistakes in there will also show in the variant calls.
About the genotypes: After detecting and clustering the SV signatures from the reads (e.g. finding 22 signatures for the second deletion svim.DEL.34097
) SVIM continues with genotyping. It scans the region around the variant and looks for read alignments contradicting the variant. In the case of a deletion, these are alignments that extend far into the deletion. For the second deletion svim.DEL.34097
, 318 of such reads were found indicating that only a minority (22 out of 340) of reads in the region support the variant. This is why the call has a high score but is still called as homozygous reference
. It's hard to recommend what to do with such calls as they are often caused by inconsistent read alignments. Maybe it helps to have a look at the alignments as suggested above.
If deletions still overlap after filtering, there is no right or wrong approach. If you have the time it's always best to have a closer look at the alignments. Even though the alignments (and resulting variant calls) might differ from each other, they could still originate from the same variant.
I hope that these answers help you somewhat with your questions ๐
Best
David
Hi David,
Thanks a lot for your extensive reply! It's much clearer now.
I will have a look at the alignments when I find the time, and will come back with questions if more come up.
Best,
Clara