Limit to reported alignments?
Closed this issue · 1 comments
The latest version was able to index a 22.5 Gb genome (1.75 million scaffolds) in 32 min using 16 cores and 99 Gb RAM, and align a file of 51,751 proteins to the index in 31 min using 16 cores and 42 Gb RAM. Thanks to @lh3 for the quick fixes! The output GFF file reports multiple alignment positions for many proteins, which is expected due to an abundance of pseudogenes in this assembly. The distribution of number of alignment positions appears to be truncated at 51 - there are 2513 proteins with 51 reported alignment positions, and no proteins with any more than that. Is this the expected behavior? In this assembly, it would not be unreasonable to see hundreds of alignment positions for some proteins.
Glad to know miniprot works on your 22 Gb fragmented assembly in reasonable time. Thanks for testing!
If you want to see more alignments, increase both -N
and --outn
to something like:
miniprot -N 1000 --outn=1000
N
controls how many hits miniprot evaluates internally. Increasing its value will make miniprot run slower. --outn
controls how many hits to output. It doesn't affect performance much.