/metagenomics_summer_school

Course materials for the Genomics Aotearoa Metagenomics Summer School, to be hosted at the University of Auckland in December

Primary LanguageShell

Metagenomics Summer School

Course materials for the Genomics Aotearoa Metagenomics Summer School, to be hosted at the University of Auckland in November.

A draft timetable for the day is provided below, but please keep in mind that this is subject to change as we evaluate our course material.


Useful locations and links

Working directories

For all exercises after the bash introduction, you will be working from the file path

/nesi/nobackup/nesi02659/MGSS_U/YOUR_USERNAME/

bash/slurm cheatsheet

A few helpful commands and shortcuts for working in bash or with slurm can be found here.

Snapshots of results to download

If you are having trouble downloading files using scp, we are providing exemplar output files which you can download through your browser, here.

Slides for workshop

You can find a copy of the slides presented during the workshop, with published figures removed, in the slides/ folder.

Etherpad snapshot

You can find a saved version of the workshop Etherpad notes here.


Workshop exercises

Day 1

  1. Bash scripting
  2. Quality filtering raw reads
  3. Assembly (part 1)
  4. Assembly (part 2)

Day 2

  1. Evaluating the overnight assembly
  2. Binning (part 1, read mapping)
  3. Binning (part 2, initial binning)
  4. Binning (part 3, dereplication)

Day 3

  1. Bin refinement
  2. Viruses
  3. Coverage and Taxonomy
  4. Gene prediction
  5. Gene annotation (part 1)
  6. Gene annotation (part 2)

Day 4

  1. Presentation of data
  2. Optional: Working with dRep

Timetable

Day 1 - 10th December 2019

Time Event Session leader
9:00 am – 9:45 am Introduction
Welcome
Logging into NeSI
David Waite
9:45 am – 10:30 am TASK: Bash scripting Dinindu Senanayake
Ngoni Faya
10:30 am – 10:50 am Morning tea break
10:50 am – 11:30 am TASK: Bash scripting (continued) Dinindu Senanayake
Ngoni Faya
11:30 am – 12:00 pm The metagenomics decision tree
TASK: Dividing into working groups
TASK: Select a goal for your project
Kim Handley
12:00 pm – 12:45 pm Break for lunch
12:45 pm – 1:45 pm Quality filtering raw reads
TASK: Visualisation with FastQC
TASK: Read trimming and adapter removal
Diagnosing poor libraries
Common issues and best practice
Florian Pichlmuller
1:45 pm – 3:00 pm Assembly (part 1)
Choice of assemblers
Considerations for parameters, and when to stop!
TASK: Exploring assembler options
TASK: Submitting jobs to NeSI via slurm
David Waite
3:00 pm – 3:20 pm Afternoon tea break
3:20 pm – 3:45 pm Assembly (part 2)
TASK: Submitting variant assemblies to NeSI
David Waite
4:00 pm – 5:00 pm End of day wrap up
Attendees can work with their own data, if available
Kim Handley
David Waite

Day 2 - 11th December 2019

Time Event Session leader
9:00 am – 9:30 am Introduction
Overview of yesterday, questions
Kim Handley
9:30 am – 10:30 am Evaluating the overnight assembly
TASK: Run evaluation tool/script
Kim Handley
10:30 am – 10:50 am Morning tea break
10:50 am – 11:20 am Overview of binning history Kim Handley
11:20 am – 12:00 pm Binning (part 1)
TASK: Short contig removal
TASK: Read mapping
Kim Handley
12:00 pm – 12:45 pm Break for lunch
12:45 pm – 1:15 pm Overview of binning history (continued)
Key parameters and strategies for binning
Kim Handley
1:15 pm – 1:45 pm Binning (part 2)
TASK: Multi-binning strategy
Kim Handley
1:45 pm - 3:00 pm Binning (part 3)
TASK: Bin dereplication via DAS_Tool
TASK: Evaluating bins using CheckM
Kim Handley
3:00 pm – 3:20 pm Afternoon tea break
3:20 pm – 4:00 pm Binning (part 4)
Discuss additional dereplication strategies, such as dRep
How to work with viral and eukaryotic bins
Dealing with organisms which possess minimal genomes
Kim Handley
David Waite
4:00 pm – 5:00 pm End of day wrap up
Optional: View assemblies with TAblet
Attendees can work with their own data, if available
Kim Handley
David Waite

Day 3 - 12th December 2019

Time Event Session leader
9:00 am – 9:30 am Introduction
Overview of yesterday, questions
Overview of today
David Waite
9:30 am – 10:30 am Bin refinement
Refinement strategies - VizBin and ESOMana
TASK: Working with VizBin
TASK: Bin taxonomy with GTDB-TK
David Waite
10:30 am – 10:50 am Morning tea break
10:50 am – 11:30 am Gene prediction
Introduce prodigal, discuss single vs anon mode
Discuss what prodigal can't find, and where other tools are needed (RNAmer, Aragorn, etc)TASK: Predicting genes with prodigal and FragGeneScan
Christina Straub
11:30 am – 12:00 pm Gene annotation (part 1)
BLAST-like gene annotation using usearch or diamond
Introduce the different databases, highlight our reasons for KEGG
Evaluating the quality of gene assignment
Differences in taxonomies (GTDB, NCBI etc)
David Waite
12:00 pm – 12:45 pm Break for lunch
12:45 pm – 3:00 pm Gene annotation (part 2))
TASK: Performing annotation with diamond
TASK: Examining gene networks in MEGAN
TASK: Tie findings to your initial goal
David Waite
3:00 pm – 3:20 pm Afternoon tea break
3:20 pm – 4:00 pm Gene annotation
Using online resources (KEGG, BioCyc, MetaCyc, HydDB, PSORT)
TASK: Tie findings to your initial hypothesis
Kim Handley
4:00 pm – 5:00 pm End of day wrap up
Attendees can work with their own data, if available
Kim Handley
David Waite

Day 4 - 13th December 2019

Time Event Session leader
9:00 am – 9:30 am Introduction
Overview of yesterday, questions
Overview of today
Kim Handley
9:30 am – 10:30 am Gene annotation (refresher)
TASK: Tie findings to your initial goal
TASK: Prepare group presentation
Kim Handley
10:30 am – 10:50 am Morning tea break
TASK: Survey
10:50 am – 12:00 pm Present and discuss findings
TASK: Each group to give a casual discussion of their data*
What were you looking for, what did you find?
Which databases were most helpful?
Kim Handley
12:00 pm – 12:45 pm Break for lunch
12:45 pm – 3:00 pm Presentation of data
How do visualise findings - Metabolism maps, heatmaps, gene trees*
TASK: Gene synteny alignments and heatmaps
David Waite
3:00 pm – 3:20 pm Afternoon tea break
3:20 pm – 4:00 pm End of day wrap up
Final discussion
Kim Handley
David Waite