A collection of code and notes from my graduate work.
The scripts blast_setup_niagara.sh, blast_submit_script_niagara.sh, and run_blast.sh are for running blast+ on the Niagara supercomputer. The blast+
program is available on Niagara as a module that can be loaded; all you need to do is upload your query and subject sequences. I used this to blast a large number of spacer sequences against metagenomic read data where the enormous fasta file with the reads was split into smaller files of 500 000 reads. This works best if there are either lots of queries or lots of separate subject sub-files so that it can be efficiently parallelized on Niagara; at least 40 serial jobs are needed for best efficiency.
- Upload scripts and subject data.
blast_setup_niagara.sh
expects that the subject data is fasta files in a folder called<accession>
and that each fasta file in the folder is named<accession_suffix>
(i.e.SRR1873837
with filesSRR1873837_aa
, etc.). Placeblast_setup_niagara.sh
, subject data, and query fasta file in a folder together. - Run
bash blast_setup_niagara.sh accession query
- this creates a series of scripts in a folder called<accession>_blast
calleddoserialjob0001.sh
etc., and copiesblast_submit_script.sh
andrun_blast.sh
into the folder. - Run
sbatch blast_submit_script.sh accession
from the same folder to submit the job to the queue.