ncbi/magicblast

Distant homology searches

Closed this issue · 2 comments

@boratyng
Hi Greg,
I understand that Magic-BLAST was not built for remote homology searches and the default parameters are tuned for high-similarity searches. But due to the non-overlapping paired-end reads in my data, I cannot use the classic blastn for querying the entire refseq transcript collection. I was wondering if we can change the default reward/penalty, gap open/extend, and alignment score thresholds? Could you please recommend the values that I can try to allow for distant matches up to 60% identity?

Thanks,
Manu

Hi @manu-script,

Magic-BLAST was not made for aligning sequences at 60% identity. As long as you are not aligning reads to genomic sequences, you can try these parameters: -word_size 12 -penalty -2 -lmit_lookup F. I have never tried these alignments, so you may have to experiment with penalty and score threshold. I would start with alignment score threshold -score at about 30% of your read length.

But there are caveats:

  • Magic-BLAST reports only the top scoring alignment, not all that score better than some threshold (like blastn).
  • Magic-BLAST does not compute E-values, so you need to figure out yourself whether you are getting any false positive search results.
  • Magic-BLAST uses a very greedy alignment extension algorithm, so the resulting alignments at 60% identity may not be optimal.

It may be easier to use BLASTN and post-process the results.

Thanks a lot, Greg! I will keep those caveats in mind and try the suggested parameters.