EBI Genome Bioinformatics: Scaling Things Up

This is the code repository used for the "Scaling Things Up" section of the EBI course Genome Bioinformatics, named in previous years as "NGS Bioinformatics".

This sections follows the previous 3 days of the course, where command line tools and basic bioinformatics commands to index files and align fastqs to a reference genome have been acquired. Here we focus on reusing the commands learnt during previous days, to run the same commands using parallelisation and job scheduling.

The following README is a copy of the 2021 Google Docs walkthrough of the interactive part of the session.

Parallelisation

Run git clone on this repository
Go into the folder you just cloned, and then inside the “Parallelisation” folder
Open the align_all_extra_fqs.sh script. What do you think the script will do?
Do you think the script will take a long time to run? What command could we use to time how long a script takes?

Modify the script so that instead of running each alignment, it echos the align command to a file we will call align_commands.sh
Run the script using the parallel command, you can even use the time command to measure how long it takes to run
How long did it take when using parallel to run the command?

Job Schedulers

Remove the echo we added to align_all_extra_fqs.sh so that it will run everything in a for loop
Do you remember how to submit a job with slurm? (hint: its the sbatch command followed by what you want to run)
Run squeue to see your job running. You should see something like this:
We will now kill our job, we do this using the scancel command followed by the JOBID. For me, this is scancel 8 . Find your jobid with squeue and cancel the job
Remove the bam files we generated here
Edit the align_all_extra_fqs.sh file to submit each bwa mem command to slurm
See all the jobs running at once

seanlaidlaw/EBI-Bioinformatics-course-Scaling_things_up

EBI Genome Bioinformatics: Scaling Things Up

Parallelisation

Job Schedulers