parallelization over chromosomes
subwaystation opened this issue · 4 comments
nf-core/pangenome feature request
Hi there!
Describe the solution you'd like
I want to be able to start the pipeline with a folder of FASTAs as an input. All current steps should be run on each of the FASTAs. This helps
- Add new input parameter
--input-folder
- Make sure that
--input
and--input folder
can't be set at the same time - Ensure that the output in the results folder reflects the naming of the input FASTA file
- Add tests
We won't parallelize over Chromosomes, but over disconnected components. Which usually are chromosomes.
Closing this. The disconnected components will be tracked by another issue.
Markdown linting is failing
To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:
- Install
markdownlint-cli
- On Mac:
brew install markdownlint-cli
- Everything else: Install
npm
then installmarkdownlint-cli
(npm install -g markdownlint-cli
)
- On Mac:
- Fix the markdown errors
- Automatically:
markdownlint . --config .github/markdownlint.yml --fix
- Manually resolve anything left from
markdownlint . --config .github/markdownlint.yml
- Automatically:
Once you push these changes the test should pass, and you can hide this comment 👍
We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
Thanks again for your contribution!
Any updates on it?
There are 2 ways parallelize:
- You run the pipeline in community detection mode with
--communitites
. The idea is that similar and related sequences are clustered into the same community and for each community the graph construction can be run in parallel. - You split your sequences manually into chromosomal communities by a given reference and execute nf-core/pangenome for each reference community.