Link to course evaluation (anonymous)

Link to course evaluation.

Welcome to the course

Welcome to the course: Targeted Minimum Loss-based Estimation (TMLE) for Causal Inference (in Biostatistics). The course starts on Tuesday the 6th of June, 2023 (see course location).

Please read the following instructions carefully.

  1. The practicals will mainly consist of computer exercises. To participate, you should bring your own laptop.
  2. A list of relevant R packages for the practicals can be found here: Relevant R packages.
  3. For an overview of the course, see: Overview of course.
  4. To prepare for the lectures, see: Reading plan.

If you have any questions or comments, feel free to email me: hely@sund.ku.dk.

Course dates and location

The course will take place in the following rooms of CSS:

TuesdayJune 6th9:00–15:00CSS 7.0.08
WednesdayJune 7th9:00–15:00CSS 7.0.18
ThursdayJune 8th9:00–15:00CSS 7.0.18
FridayJune 9th9:00–15:00CSS 7.0.18

CSS is located at Øster Farimagsgade 5, 1353 Kbh K. See also https://samf.ku.dk/kontakt/findvej/. OBS: Note that we have changed to room CSS 7.0.18 from Wednesday.

Overview of course

Targeted minimum loss-based estimation (TMLE) is a general framework for estimation of causal effects that combines semiparametric efficiency theory and machine learning in a two-step procedure. The main focus of the course is to understand overall concepts, the theory, and the application of TMLE. A sufficient background in mathematics and statistics is needed, although we emphasize that making the theory practical is really the point here (thus, many mathematical details will skipped). For the larger part of the course, we focus on the simple example of estimating an average treatment effect, with the general principles being similar for other parameters.

The course runs over four full days (9am–3pm, lunch from 12–1pm), planned largely as follows:

Day 1.
On day 1 we go through the roadmap of targeted learning (both from a theoretical and a practical angle) and give a brief introduction to basic concepts of causal inference and targeted nonparametric inference.
Day 2.
On day 2 we introduce the TMLE more specifically. We go through the targeting step, its purpose and how it is carried out for an average treatment effect. The afternoon will cover super learning.
Day 3
On Day 3 we will dig a bit further into the theoretical basis. In the afternoon will move on to causal effects in time-varying settings.
Day 4.
On day 4 we consider identification and targeting in time-varying settings, where time-dependent confounding hinders the use of “classical” statistical methods.

Reading plan

For Day 1 we recommend that you read:

Hines, O., Dukes, O., Diaz-Ordaz, K., & Vansteelandt, S. (2022). Demystifying statistical learning based on efficient influence functions. The American Statistician, 76(3), 292-304.

You can focus on Section 5. Moreover, the following provides a valuable introduction to the concepts of nonparametric efficiency theory that we need to understand TMLE:

Kennedy, E. H. (2016). Semiparametric theory and empirical processes in causal inference. In Statistical causal inferences and their applications in public health research (pp. 141-167). Springer, Cham.

You may want to focus on pages 1–13, although Section 4.1 is also quite useful. The same author has also written a really nice review more recently, that you may also want to look into:

Kennedy, E. H. (2022). Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469.

For Day 2, you may read Chapter 5 of:

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

You can use this as an introduction to TMLE. You may further read Chapter 3 of the same book as an introduction to super learning.

For Day 4, you should read:

Kreif, N., Tran, L., Grieve, R., De Stavola, B., Tasker, R. C., & Petersen, M. (2017). Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: a demonstration of longitudinal targeted maximum likelihood estimation. American journal of epidemiology, 186(12), 1370-1379.

We will use this paper in one of the practicals. You may further read:

Lendle, S. D., Schwab, J., Petersen, M. L., & van der Laan, M. J. (2017). ltmle: an R package implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statistical Software, 81(1), 1-21.

This is the software paper about the ltmle (longitudinal TMLE) software (skip Section 4 on marginal structural models).

Relevant R packages

install.packages("tmle")
install.packages("ggplot2")   
install.packages("data.table") 
install.packages("randomForestSRC")
install.packages("SuperLearner")
install.packages("ltmle")