Perform sequence partitioning for massive parallelization

Question

Perform sequence partitioning for massive parallelization

subwaystation opened this issue 2 years ago · 1 comments

Is your feature request related to a problem? Please describe

We don't exploit the possible parallelization of the pipeline when applying sequence partitioning to the input FASTA. A bash implementation can be found at https://github.com/pangenome/pggb/pull/243/files.

Describe the solution you'd like

I want to partition the input sequences up front so that we can run all the graph building steps at the moment for each partition in parallel. This means we want to generate one FASTQ report per partition!

Describe alternatives you've considered

PGGB can't be run in parallel across the partitions. At least not in general on all HPCs.

Additional context

MOVE!

Answer 1 · 2023-04-11T08:08:22.000Z

This is actually already in for quite some time.