Distant homology searches
Closed this issue · 2 comments
@boratyng
Hi Greg,
I understand that Magic-BLAST was not built for remote homology searches and the default parameters are tuned for high-similarity searches. But due to the non-overlapping paired-end reads in my data, I cannot use the classic blastn for querying the entire refseq transcript collection. I was wondering if we can change the default reward/penalty, gap open/extend, and alignment score thresholds? Could you please recommend the values that I can try to allow for distant matches up to 60% identity?
Thanks,
Manu
Hi @manu-script,
Magic-BLAST was not made for aligning sequences at 60% identity. As long as you are not aligning reads to genomic sequences, you can try these parameters: -word_size 12 -penalty -2 -lmit_lookup F
. I have never tried these alignments, so you may have to experiment with penalty and score threshold. I would start with alignment score threshold -score
at about 30% of your read length.
But there are caveats:
- Magic-BLAST reports only the top scoring alignment, not all that score better than some threshold (like blastn).
- Magic-BLAST does not compute E-values, so you need to figure out yourself whether you are getting any false positive search results.
- Magic-BLAST uses a very greedy alignment extension algorithm, so the resulting alignments at 60% identity may not be optimal.
It may be easier to use BLASTN and post-process the results.
Thanks a lot, Greg! I will keep those caveats in mind and try the suggested parameters.