/EBI-Bioinformatics-course-Scaling_things_up

Scripts from the "Scaling things up" talk I gave at the EBI NGS Bioinformatics course in Sept 2019 and Feb 2021.

Primary LanguageShell

EBI Genome Bioinformatics: Scaling Things Up

This is the code repository used for the "Scaling Things Up" section of the EBI course Genome Bioinformatics, named in previous years as "NGS Bioinformatics".

This sections follows the previous 3 days of the course, where command line tools and basic bioinformatics commands to index files and align fastqs to a reference genome have been acquired. Here we focus on reusing the commands learnt during previous days, to run the same commands using parallelisation and job scheduling.

The following README is a copy of the 2021 Google Docs walkthrough of the interactive part of the session.

Parallelisation

  1. Run git clone on this repository
  2. Go into the folder you just cloned, and then inside the “Parallelisation” folder
  3. Open the align_all_extra_fqs.sh script. What do you think the script will do?
  4. Do you think the script will take a long time to run? What command could we use to time how long a script takes?

time result for non-parallel alignments

  1. Modify the script so that instead of running each alignment, it echos the align command to a file we will call align_commands.sh
  2. Run the script using the parallel command, you can even use the time command to measure how long it takes to run
  3. How long did it take when using parallel to run the command?

Job Schedulers

  1. Remove the echo we added to align_all_extra_fqs.sh so that it will run everything in a for loop

  2. Do you remember how to submit a job with slurm? (hint: its the sbatch command followed by what you want to run)

  3. Run squeue to see your job running. You should see something like this: squeue result for non-parallel alignments

  4. We will now kill our job, we do this using the scancel command followed by the JOBID. For me, this is scancel 8 . Find your jobid with squeue and cancel the job

  5. Remove the bam files we generated here

  6. Edit the align_all_extra_fqs.sh file to submit each bwa mem command to slurm

  7. See all the jobs running at once squeue result for parallel alignments