waveygang/wfmash

Runtime much slower than minimap2 if using higher segement

baozg opened this issue · 4 comments

baozg commented

Hi, all

I use Arabidopsis genomes for testing SyRI paf input, but I found wfmash running time is much slower than minimap2 if set higher -s

System: CentOS 7

fasta:

Command:
wfmash -t 32 -p 95 -s 1k TAIR10.fa.gz Ler0.fa.gz from conda(wfmash: v0.8.2)

wfmash -s 1k wfmash -s 10k minimap2 -ax asm20 --eqx -t 5
Time 1:20 04:40.4 1:46
CPU 2091% 752% 341%

It is expected, because bigger -s force the mappings to cover bigger structural variations, making the alignments harder.

baozg commented

For now, how to set -s approximately? -s no longer need to be exceed the length of large repeats.

Hard to say. When there are short sequences (length L), I use -s << L (for example, equal to L / 5 or L / 10). With longer sequences, I think of structural variations or repetitions, but I usually don't go beyond 50kbps. For A. thaliana, I have a bit of old experience with -s 10k, which seemed a good tradeoff.

baozg commented

Thanks for explanation.