Perform sequence partitioning for massive parallelization
subwaystation opened this issue · 1 comments
subwaystation commented
Is your feature request related to a problem? Please describe
We don't exploit the possible parallelization of the pipeline when applying sequence partitioning to the input FASTA. A bash implementation can be found at https://github.com/pangenome/pggb/pull/243/files.
Describe the solution you'd like
I want to partition the input sequences up front so that we can run all the graph building steps at the moment for each partition in parallel. This means we want to generate one FASTQ report per partition!
Describe alternatives you've considered
PGGB can't be run in parallel across the partitions. At least not in general on all HPCs.
Additional context
MOVE!
subwaystation commented
This is actually already in for quite some time.