A pipeline to build Qiime2 taxonomy classifiers for the UNITE database.
Set up:
- Install Mambaforge and configure Bioconda.
- Install the version of Qiime2 you want using the recomended environment name.
(For a faster install, you can replace
conda
withmamba
.) - Install Snakemake into an environment, then activate that environment.
Configure:
- Open up
config/config.yaml
and configure it to your liking. (For example, you may need to update the name of your Qiime2 environment.)
Run:
snakemake --cores 8 --use-conda --resources mem_mb=10000
Training one classifier takes 1-9 hours on an AMD EPYC 75F3 Milan, depending on the size and complexity of the data.
Run on a slurm cluster:
More specifically, The University of Florida HiPerGator supercomputer, with access generously provided by the Kawahara Lab!
screen # We connect to a random login node, so we may not be able...
screen -r # to reconnect with this later on.
snakemake --jobs 24 --slurm \
--rerun-incomplete --retries 3 \
--use-envmodules --latency-wait 10 \
--default-resources slurm_account=kawahara slurm_partition=hpg-milan
Run with Docker:
Say, in 'the cloud' using FlowDeploy.
snakemake --jobs 12 \
--rerun-incomplete --retries 3 \
--use-singularity \
--default-resources
Reports:
snakemake --report results/report.html
snakemake --forceall --dag --dryrun | dot -Tpdf > results/dag.pdf