/bcb731

Defense Against the Dark Arts

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

BCB 731: Critical readings in biomedical statistics and machine learning

BCB 731 (a.k.a Defense Against the Dark Arts) is a survey of recurring statistical errors and pitfalls which are sometimes used to exaggerate the weight of evidence for novel biological claims or inflate the estimated accuracy of proposed predictive biomedical models. This course focuses on misapplied analyses of data sources where a small number of biological samples are quantified into very high dimensional feature spaces, such as in genomics, proteomics, and biomedical imaging.

Crucially, this is not a course about data falsification or intentional research misconduct. Our focus is the hazy space in which good intentions meet flawed incentives, motivated reasoning, and high dimensional data.

Fall 2023 Schedule

Date Topic Papers
10/2 Reproducibility, and the lack thereof, in scientific research
10/4 Empiricism, scientific models, statistics, machine learning, and data analysis
10/9 Machine learning, model evaluation, overfitting, and generalization (whiteboard)
10/16 The frequentist hypothesis testing version of overfitting: p-hacking, HARKing, & related phenomena (whiteboard)
10/18 Into the Garden of Forking Paths (studies with same data and many analysts)
10/23 Optimist: Genetic basis for clinical response to CTLA-4 blockade in melanoma Snyder 2014 NEJM
10/25 Critic: Genetic basis for clinical response to CTLA-4 blockade in melanoma Snyder 2014 NEJM
10/30 Optimist: A neoantigen fitness model predicts tumor response to checkpoint blockade immunotherapy Łuksza 2017 Nature
11/1 Critic: A neoantigen fitness model predicts tumor response to checkpoint blockade immunotherapy Łuksza 2017 Nature
11/6 Optimist: Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction Wells 2020 Cell
11/8 Critic: Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction Wells 2020 Cell
11/13 Beginner p-hacking bootcamp: leaking labels through feature construction and selection (notebook)
11/15 Intermediate p-hacking bootcamp: overfitting a classifier from metadata (notebook)
11/20 Advanced p-hacking bootcamp
11/27 Optimist: Microbiome analyses of blood and tissues suggest cancer diagnostic approach Poore 2020 Nature
11/29 Critic: Microbiome analyses of blood and tissues suggest cancer diagnostic approach Poore 2020 Nature
12/4
12/6

Links

REPRODUCIBILITY CRISIS

P-HACKING (AND RELATED COMMON DISASTERS IN STATISTICAL HYPOTHESIS TESTING)

RESEARCH SCANDALS

OTHER CLASSES

EARLY 20TH CENTURY STATISTICS

EXPLORATORY DATA ANALYSIS

STATS/ML

STATS/ML BOOKS

MODEL OVERFITTING / INTERPOLATIVE MEMORIZATION (AKA DOUBLE DESCENT)

CAUSAL INFERENCE

PRE-16TH CENTURY SCIENCE & PROTO-SCIENCE:

PRE-MODERN STATISTICS

POST-16TH CENTURY EMPIRICAL SCIENCE (WITHOUT MUCH STATISTICS):