Heteroplasmy Detection Issue for Variants Present in Excess of 90%
BarbaraSlap opened this issue · 1 comments
We focused on identifying heteroplasmy levels in nanopore sequencing data using Mutserve2 with default settings. We simulated various levels of heteroplasmy for one mitochondrial variant using bioinformatics. What we observed is that the proportion of called variants starts to decline at a heteroplasmy ratio between 0.93 and 0.97, with values mainly ranging between 0.93 and 0.94.
This suggests that the variant, which is present in more than 90% of reads, wasn't called in about 70 out of 999 attempts. Furthermore, this variant is in the .txt file but flagged as strand bias, so it's filtered out and doesn't appear in the final VCF file.
Do you have any ideas to explain this anomaly? Are there specific parameters that can be adjusted to address this issue?
That's interesting, we didn't see that when we made the validations with mtDNA mixtures (lab-based) on the GridION: https://www.frontiersin.org/articles/10.3389/fgene.2022.887644/full
could you check the per-base qualities of the data - e.g. with FastQC, what is the mean per-base quality? How did you simulate the data? Basically the parameters you could tweak are --baseQ, --mapQ and alignQ.
Strand-bias indicates that the variant is found at significantly different levels on the forward and reverse strand - issues that are known to cause strand-biases, e.g homopolymorphic c-stretch around bp 310, 955, 16189, ...)