/AppStatBio

Applied Statistics for High-Throughput Biology

MIT LicenseMIT

Syllabus: Applied Statistics for High-Throughput Biology with Applications to Single-cell Sequencing

Instructor

Levi Waldron, PhD
Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.

Email: lwaldron.research@gmail.com

Summary

This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data, with applications focused on single-cell sequencing. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis and other dimension reduction methods, multiple hypothesis testing, and batch effects.

Textbooks

Labs

Each day will include a hands-on lab session from Orchestrating Single-Cell Analysis with Bioconductor, that students should attempt in full.

Session detail by day

Lecture materials are available in HTML format from https://waldronlab.io/AppStatBio/.

  1. Introduction
  • random variables
  • distributions
  • hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
  1. Dimensionality reduction
  • distances in high dimensions
  • principal components analysis and singular value decomposition
  • multidimensional Scaling
  • t-SNE and UMAP
  1. Linear modeling
  • multiple linear regression
  • model formulae
  • generalized linear models
  • multiple hypothesis testing
  1. Exploratory data analysis and batch effects
  • plots for exploratory data analysis
  • about batch effects