Normalization of INDELs: required or should be avoided
davidyuyuan opened this issue · 3 comments
I was wondering whether INDEL normalization is required for better phasing results or should be avoided as it might also destroy phasing. I was unable to find any discussion on the INDEL normalization in the documentation or the closed issues. Can you please shed some light?
Thank you in advance!
Hello,
HiPhase does not require it to run successfully (e.g. most of the TRGT calls have not been through a traditional normalization step). However, the tools we currently support should normalize the variants by default (excluding TRGT).
If you're using other variant calling software, then I probably recommend normalizing the variants. For a single VCF file, I doubt this would have a major impact. But if you have 2 or more input VCFs to phase, then normalization starts to become more important for detecting identical and/or overlapping calls between the two VCFs.
Matt
Thanks, Matt.
I am using bcftools mpileup and bcftools call to generate a VCF file for 1 sample at a time. Its norm command does correct a good number of INDELs.
Thank you for the clarification. I will normalize the raw VCFs before sending them to HiPhase.
David
Closing for now, but feel free to re-open if there are follow ups!