/PH295-lab

PB HLTH 295, Big Data Seminar (Fall 2016, UC Berkeley)

Primary LanguageHTMLMIT LicenseMIT

PB HTLH 295: Big Data Seminar material

Theme: Targeted Learning in the era of Big Data

This course is aimed at providing both theoretical and practical tools for analyzing big data generated by modern biomedical studies. It could therefore be of interest to Ph.D. students in quantitative fields who are interested in learning about recent theoretical developments in the area, as well as MA students interested in learning practical skills for analyzing big data.

We will discuss problems arising from traditional data analysis in biomedical big data settings and study the targeted learning roadmap for causal inference as a solution to these problems. We will cover fundamental topics in causal inference including causal models, defining causal quantities that represent the answer to scientific questions of interest, and casual assumptions under which the causal quantity can be identified from the observed data. Specific examples of questions of interest that will be covered include precision medicine, stochastic interventions, and time-to-event outcomes. We will discuss practical tools for estimating causal quantities using state-of-the-art machine learning techniques including the SuperLearner and h20ensemble R packages. We discuss how such techniques can be used to construct asymptotically efficient estimators through one-step estimation and targeted minimum loss-based estimation (TMLE). We discuss how these estimators facilitate construction of scalable confidence intervals and statistical hypothesis tests. Finally, we discuss recent extensions of super learning and TMLE to the online estimation setting, thereby providing statistical estimation and inference for arbitrarily large data sets.

Schedule

  • Week 1: No Lab
  • Week 2: Intro to R
  • Week 3: Make-up Lecture 2
  • Week 4: Guest speaker 1 (Oleg Sofrygin) -- simcausal
  • Week 5: Bias-Variance Tradeoff
  • Week 6: SuperLearner: Part I
  • Week 7: SuperLearner: Part II
  • Week 8: Review
  • Week 9: h20ensemble (Chris Kennedy) + SuperLearner: Part III (David)
  • Week 10: Git + Amazon EC2
  • Week 11: Guest speaker 2 (BRC) -- Savio & parallelization
  • Week 12: TMLE
  • Week 13: No Lab (Thanksgiving)

License

© 2016 Mark J. van der Laan, David C Benkeser, Wilson Cai

This repository is licensed under the MIT license. See LICENSE for details.