/Biochem-3BP3

Lectures and Lab Exercises for Biochem 3BP3 Practical Bioinformatics for the Genomics Era

Primary LanguageBatchfileGNU General Public License v3.0GPL-3.0

BIOCHEM 3BP3 Practical Bioinformatics in the Genomics Era

Department of Biochemistry & Biomedical Sciences, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada

This is a living document, content will be updated frequently.

Introduction to bioinformatics theory, tools, and practice with an emphasis on high-throughput DNA sequencing technologies. Areas of emphasis include gene sequence analysis, functional prediction, genome assembly and annotation, gene expression analysis, gene regulation analysis, genome databases, and microbial genomics. Includes an introduction to the command line, software development, and cloud computing.

By the end of this course, the student should have practical skills with a number of bioinformatics techniques common in a modern research laboratory, familiarity with online databases and their use, and a knowledge of the use of genomics data for hypothesis testing.

https://academiccalendars.romcmaster.ca/preview_course_nopop.php?catoid=24&coid=142047

This GitHub repository only contains material developed by Dr. McArthur directly and does not include guest lectures, student generated content, or course documents. These are only available to registered students via Avenue to Learn. In addition, some of the exercises require password access to class servers, available to registered students only. These can be provided by request for undergraduate and graduate students in Biochemistry & Biomedical Sciences, the Michael G. DeGroote Institute for Infectious Disease Research, or other affiliated programs. Please see License and Copyright information.

Those with McMaster University credentials can view the entire Lecture & Lab Video Collection for BIOCHEM 3BP3 Fall 2023.

Course Schedule Fall 2023

Week Dates Lecture Tutorial Flash Updates
1 September 5 & 7 Lecture 1: Introduction to Bioinformatics & the Course video ~43 minutes Tours of FHS SeqCore
2 September 12 & 14 Tours of SHARCNET Lab 1: Introduction to Lab & Genome Databases GenBank, Ensembl, Growth of Sequencing Data
3 September 19 & 21 Lecture 2: Sequence Similarity & Searching video ~35 minutes Lab 2: Searches, Protein Annotation BLAST, Pfam, PROSITE
4 September 26 & 28 Lecture 3: Evolutionary Biology video ~38 minutes Bonus: Bayesian Phylogenetics Lab 3: Phylogeny Lab Terminology, Sequence Alignment, Phylogenetic Trees
5 October 3 & 5 Lecture 4: Beyond the Gene - Networks, Ontologies video ~28 minutes Lab 4: Ontology and Antimicrobial Resistance Gene Ontology, KEGG, CARD
6 October 10 & 12 (mid-term recess)
7 October 17 & 19 NO CLASS Lab 5: Linux & Sequencing Informatics (demo) Sanger Sequencing, FASTA, Linux
8 October 24 & 26 Lecture 6: DNA Sequencing & Genome Assembly video ~39 minutes Bonus: De Bruijn graph walkthrough Lab 6: Galaxy, FASTQ, Assembly Illumina Sequencing, FASTQ, Galaxy
9 October 31 & November 2 Lecture 7: Molecular Epidemiology video ~36 minutes Lab 7: SNP analysis & Molecular Epidemiology SNPs, Horizontal Gene Transfer, Metagenomics
10 November 7 & 9 Lecture 8: Gene Expression Analysis video ~48 minutes Lab 8: Microarray Lab (demo) Microarrays, Normalization, False Discovery
11 November 14 & 16 Lecture 9: RNA-Seq, ChIP-Seq, Bisulfite-Seq video ~33 minutes Lab 9: RNA-Seq RNA-Seq, Illumina HT-12, Tn-Seq
12 November 21 & 23 Lecture 10: Advances in DNA Sequencing video ~32 minutes Guest Lecture & Lab: Dr. Samantha Wilson Random Forest, Logistic Regression, Natural Language Processing
13 November 28 & 30 Lecture 11: Genomics of Pandemics video ~80 minutes
14 December 5 Lecture 12: Internet of Things & Big Data video ~44 minutes
  • All assignments are to be submitted to A2L by 11:59 pm on the date the assignment is due unless otherwise stated.
  • The Critical Review and Reflective Essay are to be submitted to the assessment drop box on A2L by 11:59 pm on October 24, 2023 and December 6, 2023, respectively.
  • Throughout the term, each student will give a single 10-minute Flash Update presentation on an assigned topic and must upload their slides to A2L before the start of their tutorial.

Supplementary Videos

Not official course content for Fall 2023, recorded during Fall 2020, requires McMaster authorized access. Please note, Stream (Classic) is being decommissioned and all videos will be migrated to Stream (on OneDrive and SharePoint) in August, 2023.

  • Dr. Joanna Wilson, Department of Biology, McMaster University discusses their research program in aquatic toxicology and the role of genomics and bioinformatic in their research, video ~10 minutes
  • Dr. Christine Mader, Farncombe Metagenomics DNA Sequencing Core, McMaster University provides an overview of McMaster high-throughput DNA sequencing facility, video ~72 minutes
  • Mark Hahn, SHARCNET/Digital Alliance provides an overview of the SHARCNET high-performance computing platform, video ~50 minutes
  • Dr. Robyn Lee, Dalla Lana School of Public Health discusses critical infectious disease analyses in the Canadian north, video ~7 minutes
  • Dr. Fiona Whelan, University of Nottingham discusses genomics and bioinformatics of the human microbiome, video ~6 minutes
  • Dr. Guillaume Paré, Population Health Research Institute, McMaster University discusses exome sequencing and the genetics of cardiovascular disease, video ~17 minutes
  • Dr. Shawn Hercules, Department of Biology McMaster University discusses the genetic underpinnings of triple negative breast cancer, video ~56 minutes
  • Drs. Michael Chong & Ricky Lali, Population Health Research Institute, McMaster University discuss genome-wide association and cardiovascular disease, video ~95 minutes
  • Dr. Sandrine Moreira, National de Santé Publique du Québec discusses implementation of microbial genomics in a public health lab video ~50 minutes
  • Dr. Kara Tsang, London School of Hygiene & Tropical Medicine discusses analytics and machine learning to predict antimicrobial resistance video ~50 minutes

Flash Updates

WEEK 2 - GenBank, Ensembl, Growth of Sequencing Data

WEEK 3 - BLAST, Pfam, PROSITE

  • BLAST. Provide a review of the purpose of BLAST algorithms for database searching and how to perform them online. Specifically, outline the difference between BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX. See Nature Education 1: 215 & Curr Protoc Mol Biol. 2001 May;Chapter 19:Unit 19.3 PMID 18265177.
  • Pfam. Provide a review of the Pfam resource, with an emphasis on the variety of tools and data it offers (as well as its migration to InterPro). See Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419 PMID 33125078 and Nucleic Acids Res. 2023 Jan 6; 51(D1): D418–D427 PMID 36350672.
  • PROSITE. Provide a review of the PROSITE resource, with an emphasis on the variety of tools and data it offers. See Nucleic Acids Res. 2013 41(Database issue):D344-7 PMID 23161676 and the PROSITE website.

WEEK 4 - Terminology, Sequence Alignment, Phylogenetic Trees

  • Terminology. Explain the difference between the terms “similarity” and “homology”. Differentiate between the terms “homolog”, “paralog”, “ortholog”. See Annu Rev Genet. 2005;39:309-38 PMID 16285863 and BLAST Glossary.
  • Sequence Alignment. Explain the difference between local alignment (e.g. BLAST) and global alignment (e.g. CLUSTAL) and introduce the CLUSTAL family of algorithms. See Protein Sci. 2018 Jan;27(1):135-145 PMID 28884485.
  • Phylogenetic Trees. Overview what a phylogenetic tree represents and the major concepts for its interpretation. See Nature Education 1: 190 and How to read a phylogenetic tree.

WEEK 5 - Gene Ontology, KEGG, CARD

  • Gene Ontology. Introduce the Gene Ontology. See Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338 PMID 30395331 and Genetics 2023 May 4;224(1):iyad031 PMID 36866529.
  • KEGG. Introduce the Kyoto Encyclopedia of Genes and Genomes (KEGG). See Nucleic Acids Res. 2023 Jan 6;51(D1):D587-D592 PMID 36300620 and Nucleic Acids Res. 2019 Jan 8;47(D1):D590-D595 PMID 30321428.
  • CARD. Introduce the Comprehensive Antibiotic Resistance Database. See Nucleic Acids Res. 2023 Jan 6;51(D1):D690-D699 PMID 36263822 and Nucleic Acids Res. 2020 48(Database issue):D517-D525 PMID 31665441.

WEEK 7 - Sanger Sequencing, FASTA, Linux

  • Sanger Sequencing. Review the Sanger DNA sequencing method, with emphasis upon automation by Applied Biosystems. See Nature Education 1:193 and The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing. Note: You do not need to introduce 454, Illumina, or Next-Generation Sequencing (NGS) methods.
  • FASTA. Introduce the FASTA file format, review it’s origins and illustrate how it was adapted for raw DNA sequencing results. Also introduce the concept of quality scores generated by the legacy base calling software PHRED (the QUAL format file). See Wikipedia, PHRED, and Nucleic Acids Res. 2010 38:1767-71 PMID 20015970. Note: You do not need to introduce the FASTQ format for Next-Generation Sequencing (NGS) methods.
  • Linux. Introduce the concept of the operating systems (Windows, Mac, “command line”). Give a brief history of the origins of UNIX and how it differs from LINUX. See What is Linux, Differentiating UNIX and Linux, and Difference between Unix and Linux.

WEEK 8 - Illumina Sequencing, FASTQ, Galaxy

WEEK 9 - SNPs, Horizontal Gene Transfer, Metagenomics

  • SNPs. Define the term Single Nucleotide Polymorphism (SNP) and explain how these data can be used to determine organism/strain relatedness. Use SARS-CoV-2 as an example, see Microbiol Spectr. 2023 Jun 15;11(3):e0190022 PMID 37093060 and Phylogenetic Analysis of SARS-CoV-2 in Ontario.
  • Horizontal Gene Transfer. Define the term Horizontal Gene Transfer (HGT; also known as Lateral Gene Transfer, LGT) and discuss how it could confound determination of organism/strain relatedness using SNP analysis. Use the emergence of MCR-1 as an example, Lancet Infect Dis. 2015 Nov 18. pii: S1473-3099(15)00424-7 PMID 26603172.
  • Metagenomics. Introduce metagenomics in the context of molecular and clinical epidemiology. See Expert Rev Mol Diagn. 2018 Jul;18(7):605-615. PMID 29898605.

WEEK 10 - Microarrays, Normalization, False Discovery

WEEK 11 - RNA-Seq, Illumina HT-12, Tn-Seq

  • RNA-Seq. Overview the steps in RNA-Seq analysis of transcriptomes. See Nat Rev Genet. 10:57-63. PMID 19015660 and Study gene expression using RNA sequencing.
  • Illumina Bead Microarrays. Introduce ‘bead chip’ technologies for measurement of gene expression levels. Contrast the method with RNA-Seq and traditional two-channel microarrays. Illustrate how the technology can be use for gene expression, gene copy number, and gene methylation measurement. See Bead-Based Microarray Technology and embedded links.
  • Tn-Seq. Provide an overview on the Tn-Seq approach to examining bacterial genetics. See MBio 2:e00315-10. PMID 21253457.

WEEK 12 - Random Forest, Logistic Regression, Natural Language Processing