Dealing with structural variants
Opened this issue · 3 comments
Hello @jodyphelan,
We are currently implementing the analysis of IS6110 insertions onto our structural variants workflow, but we are running into some unexpected TBProfiler outputs.
The attached vcf (fixed_joint.delly.vcf.gz) has a number of structural variants, detected by either Delly or ISMapper, but it seems that only one of these (pncA_c.-2148_*747del) is being processed by TBProfiler. At first we thought that this could be an issue with the format of the ISMapper variants, but we've noticed that some from Delly are also not being reported. This is expected for those occurring in non-DR regions, which are simply added to the total_variants count, but we have found at least one instance of a variant occurring within a DR region (1359bp deletion in Rv2477c, a Tier 2 gene for STM, MFX, EMB, LFX, AMK, RIF & KAN) that is also not showing up in the json outputs.
Do you think this issue is related to the format of our input vcf, or perhaps these variants are not being correctly interpreted by SnpEff?
Thanks,
Miguel de Diego Fuertes
I'll take a look at this and get back to you asap!
It looks like it could be an issue todo with snpEff not annotating insertions wiht the alt set to <INS>.
Is there any way you could get the actual sequence that is inserted?
Hi Jody,
Apologies for the delay in replying, and thanks for taking a look at this. The inserted sequence, in the case of IS6110 insertions, is attached here (in txt format).
While this might solve the IS6110 insertions, the structural variants detected by Delly still pose an issue, since the alt sequences are variable and can span several thousand bp. Do you have any thoughts on how these could be dealt with?
Thanks again,
Miguel de Diego Fuertes