Phasing interpretation
Closed this issue · 4 comments
Hello,
I am just wondering how should I interpret the unphased genotypes in the context of the 'variants.vcf' output file. Do those really matter?
Hi,
currently, SVIM does not phase SVs at all so all variants in the variants.vcf
output are unphased.
In case you were referring to variants without a genotype (i.e. ./.
in the GT column): These happen when there is not enough evidence in the read alignments to tell whether a variant is homozygous or heterozygous. SVIM has a few parameters you can set that define which variants are assigned a genotype. For instance, --minimum_depth
defines the minimum read depth over the variant that needs to be reached for a genotype to be assigned.
Does this clear up things or did I misunderstand your question?
Cheers
David
Thanks for your answer but I think you misunderstood the question. Basically I am wondering What a 0/0 genotype would mean?
I understand that typically 0 is for REF and 1 ALT. According to this, a 0/1 would mean that we have both alleles present in the query reads and, following this, it would also make sense to me having the 1/1. But I am having a hard time understanding the 0/0 genotype. I know this may be a more general question about the vcf format but I'd like to know if this has more implications in the main output file of svim.
Best,
Javier
Hi Javier,
thanks for clarifying your question. I think I got confused by the word "phasing" but it's clear now.
You are right that 0 stands for the reference allele while 1 stands for the alternative allele. At any position in reference genome there are three theoretical possibilities: 1/1 (homozygous variant), 1/0 (heterozygous variant) and 0/0 (homozygous reference). The 0/0 means that both parental haplotypes carry the reference allele, i.e. the position is not a variant at all. But why does SVIM report these positions then?
The reason comes from the approach SVIM uses to detect variants. In the first stage of the pipeline, SVIM searches every read alignment individually for signatures of SVs. That means that even a deletion supported by only a single read is detected and recorded. Later in the genotyping stage, SVIM tries to find the genotype for each SV. To this end, all read alignments close to the SV coordinates are analyzed and we count the number of reads supporting the variant (V) and the number of those supporting the reference allele (R). Then, the ratio V/(R+V) of supporting reads is computed and compared against thresholds than can be modified by the user. If the ratio is very high, the variant gets the genotype 1/1. If the ration is around 50%, the variants gets the genotype 1/0. And if the ration is very low, the variant gets the genotype 0/0.
So what are those variants with 0/0? They basically represent SVs that are supported by a few reads while the majority of reads in the region supports the reference allele. They could be real SVs or artifacts caused by wrong alignments. I would recommend that you have a look at some of them together with the input read alignments in a genome browser like IGV to get a better understanding of them.
Best,
David
Hi David,
Yeah, I shouldn’t have used the word "phasing". At any rate, thank you for the great explanation, now it makes sense to me.
Best,
Javier