Runtime much slower than minimap2 if using higher segement

Question

Runtime much slower than minimap2 if using higher segement

baozg opened this issue 2 years ago · 4 comments

Hi, all

I use Arabidopsis genomes for testing SyRI paf input, but I found wfmash running time is much slower than minimap2 if set higher -s

System: CentOS 7

fasta:

Command:
wfmash -t 32 -p 95 -s 1k TAIR10.fa.gz Ler0.fa.gz from conda（wfmash: v0.8.2）

	wfmash -s 1k	wfmash -s 10k	minimap2 -ax asm20 --eqx -t 5
Time	1:20	04:40.4	1:46
CPU	2091%	752%	341%

Answer 1 · 2022-05-13T16:18:46.000Z

It is expected, because bigger -s force the mappings to cover bigger structural variations, making the alignments harder.

Answer 2 · 2022-05-13T16:24:16.000Z

For now, how to set -s approximately? -s no longer need to be exceed the length of large repeats.

Answer 3 · 2022-05-19T10:03:03.000Z

Hard to say. When there are short sequences (length L), I use -s << L (for example, equal to L / 5 or L / 10). With longer sequences, I think of structural variations or repetitions, but I usually don't go beyond 50kbps. For A. thaliana, I have a bit of old experience with -s 10k, which seemed a good tradeoff.

Answer 4 · 2022-05-23T03:15:53.000Z

Thanks for explanation.