Syllabus: Applied Statistics for High-Throughput Biology with Applications to Single-cell Sequencing
Levi Waldron, PhD
Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.
Email: lwaldron.research@gmail.com
This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data, with applications focused on single-cell sequencing. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis and other dimension reduction methods, multiple hypothesis testing, and batch effects.
- Primary: Biomedical Data Science by Irizarry and Love (ePub version on Leanpub)
- Secondary: Modern Statistics for Modern Biology by Holmes and Huber
- Lab materials: Orchestrating Single-Cell Analysis with Bioconductor (OSCA) by Amezquita, Lun, Hicks, Gottardo, O’Callaghan
Each day will include a hands-on lab session from Orchestrating Single-Cell Analysis with Bioconductor, that students should attempt in full.
Lecture materials are available in HTML format from https://waldronlab.io/AppStatBio/.
- Introduction
- random variables
- distributions
- hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
- Dimensionality reduction
- distances in high dimensions
- principal components analysis and singular value decomposition
- multidimensional Scaling
- t-SNE and UMAP
- Linear modeling
- multiple linear regression
- model formulae
- generalized linear models
- multiple hypothesis testing
- Exploratory data analysis and batch effects
- plots for exploratory data analysis
- about batch effects