Polish only INDELs or only SNPs, and polishing thresholds
mmontonerin opened this issue · 1 comments
Hi, I have tried NextPolish, and oveall I am happy with it, but I miss a bit pore possibilities to select what to polish in order to trust what is doing to the de novo genome assemblies I am working with.
One functionality that I feel I miss in NextPolish is the possibility to fix either only INDELs or only SNPs, depending on the type of data that is being used. For example, I have a set of short reads that I would want to use to only correct INDELs, as many SNPs could be just normal heterozygous sites, in different proportions in different datasets.
I also miss the possibility to be a bit more conservative in polishing, and be able to select a certain depth or quality threshold for a position to be polished.
Do you plan to implement any of these functionalities in the future?
Hi, first, thank you for your good suggestions. However, SNP and INDEL are hard to distinguish for NextPolish, because NextPolish correct error-bases using kmers, so NextPolish does not distinguish between SNP and INDEL. For heterozygous kmer, NextPolish selects the kmer with the most counts as the corrected kmer.
BTW, I will consider your suggestion and maybe add some extra functions/parameters in the future.