/bioinformatics-training

Giving myself a bioinformatics degree

Primary LanguageHTML

Bioinformatics Training

A collection of educational content and software for learning the beauty of bioinformatics.

Background

Update: February 2021

I started a new journey at Freenome as a Bioinformatics Research Engineer.

Update: September 2020

In September 2019, I started working as a software engineer at DNAnexus. During my first week I joined an internal group called Science Frontiers, which is the equivalent of 20% Projects at Google. Within a few weeks, I began collaborating with a bioinformatician/microbiome specialist on one of his projects involving the clustering of microbial data extracted from the Human Microbiome Project. Humbled by the vastness of this field of research, the dedication of our researchers, and the incredible applications of bioinformatics in general, a deep curiosity formed within me to understand the fundamentals of this science. I began by the picking the brains of the many scientists at the company and reading laymen books such as The Gene, whilst making progress in both my work at the company as well as the Science Frontiers project. Then COVID-19 hit. With essentially no social life, I thought this would be as good a time as any to give myself an unofficial degree in bioinformatics, perhaps something that would quickly amount to the equivalent of an undergraduate degree at a notable university.

Topics (TODO: add links to everything)

Motif Finding

Genome Assembly

  • De Bruijn graphs
  • Cyclopeptide sequencing

Sequence Alignment

  • Smith-Waterman, Needleman-Wunch algorithm, and Hirschberg algorithms
  • Multiple sequence alignment
  • PAM / BLOSUM scoring matrices

Genome Rearrangement

  • Random Breakage, Fragile Breakage, and Whole Genome Duplication models
  • Synteny block graph / 2-break distance/sorting

Molecular Evolution

  • Additive phylogeny
  • Unweighted Pair Group Method with Arithmetic Mean (UPGMA)
  • Neighbor-Joining
  • Maximum Parsimony

Genomic Data Science

  • Clustering gene expression matrices: k-means, soft k-means, and hierarchical clustering

Tools & Databases

Bioinformatics Platforms

  • DNAnexus (thanks for the employment)
  • Galaxy
  • Glow
  • BoaG

General Platforms with bioinformatics capabilities

  • EMBL-EBI

Motif discovery

  • Consensus
  • MEME

Alignment / Mapping

  • Samtools
  • BLAST
  • EMBOSS Water/Needle
  • Clustal Omega

Variant calling

  • Samtools

Quality Control (QC) / Trimming

  • FastQC
  • Scythe
  • Sickle
  • MultiQC

Assembly

  • Quast

Molecular Evolution

  • MEGA

File formats

  • FASTA/FASTQ
  • VCF
  • SAM

Databases

  • GenBank

Educational Material

Courses

  • UCSD Bioinformatics Algorithms Specialization (Coursera)
  • John Hopkins Genomic Data Science (Coursera)
  • Bioinformatics: Tools for Genome Analysis (John Hopkins AS.410.635.82)

Textbooks

  • Bioinformatics Algorithms
  • Bioinformatics with Python Cookbook
  • Molecular Biology: Principles of Genome Function
  • Molecular Population Genetics
  • An Introduction to Population Genetics

Tutorials

Layman Books

  • The Gene by Siddhartha Mukherjee

Problem solving

  • Stepik
  • Rosalind

Blogs

Forums

  • Biostars

Seminal Papers & Milestones

Open research questions

Genome rearrangment

  • Where are the fragile regions located? What causes fragility?

Industry (TODO: categorize)

  • DNAnexus
  • Illumina
  • Pacific Biosciences
  • GRAIL
  • Freenome

Proteomics (TODO)

...