prophyle/prophasm2

Feature request: Parallelism

karel-brinda opened this issue · 2 comments

The -t {threads} parameter

Parallelizing ProPhasm, local greedy or global greedy will be quite challenging IMHO and definitely require a lot of care. The program may also need more memory. This is because when extending several local paths (simplitigs or pseudosimplitigs) in parallel, one needs to lock individual k-mers so that no two threads will add any k-mer at once.

Likewise, global greedy in the hash-table implementation may be parallelized by searching for length-d overlaps in parallel, again using locking on individual k-mers that are merged.

This is actually an interesting question.

I think we don't need to parallelize the actual computation of simplitigs (it's quite fast at its own I believe), but other levels:

  1. simultaneous reading from multiple files
  2. parallelized reading of individual files (eg individual sequences)

Some relevant experiments regarding parallelization and optimized reading were done here:
https://github.com/lh3/kmer-cnt