CV-TMLE vs TMLE

Question

CV-TMLE vs TMLE

olivierlabayle opened this issue a year ago · 0 comments

Hello,

I am following the tutorial and trying to look at the difference between CV-TMLE and TMLE with the perinatal dataset.

To keep things simple I only use a glm as the model for both the propensity score and the outcome mean. I am surprised to see that the output is exactly the same for both procedures. The CV-TMLE seems to complain about glm not being "CV-aware" which might be the reason. However I don't understand why that should be the case. My understanding of CV-TMLE is that:

The dataset should be splitted in V folds
The glm models (for both A and Y) should be fitted on each split, so we should have V instantiations of each glm each trained on a different split.
The targeting step is pooled from predictions of the V glm model pairs on their respective validation sets
The final estimate is the average of estimates across validation folds
The influence curve (I am not entirely sure if it is pooled across validation samples or if multiple variance estimates are made and averaged)

As I understand it, we could have used a Super Learning instead of a GLM which would have resulted in another nested cross-validation procedure but Super Learning is not a requirement of CV-TMLE. The code to reproduce is below: you can tweak the learner_list to change to a super learner and then 2 different outputs are returned and no "CV-aware" complaint is formulated.

I would appreciate some clarification on the procedure and why this is happening! Thanks!

library(data.table)
library(tmle3)
library(sl3)

data = read.csv("perinatal.csv")

node_list <- list(
  W = c(
    "apgar1", "apgar5", "gagebrth", "mage", "meducyrs", "sexn"
  ),
  A = "parity01",
  Y = "haz01"
)

glm = Lrnr_glm$new()
lrn_mean = Lrnr_mean$new()
sl <- Lrnr_sl$new(learners = Stack$new(glm, lrn_mean), metalearner = Lrnr_nnls$new())

learner_list <- list(A = glm, Y = glm)
# learner_list = list(A=sl, Y = sl)

ate_spec <- tmle_ATE(
  treatment_level = 1,
  control_level = 0
)

tmle_task <- ate_spec$make_tmle_task(data, node_list)
initial_likelihood <- ate_spec$make_initial_likelihood(
  tmle_task,
  learner_list
)


targeted_likelihood_cv <- Targeted_Likelihood$new(initial_likelihood)

targeted_likelihood_no_cv <-
  Targeted_Likelihood$new(initial_likelihood,
    updater = list(cvtmle = FALSE)
  )

tmle_params_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_cv)
tmle_params_no_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_no_cv)

tmle_no_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_no_cv, tmle_params_no_cv,
  targeted_likelihood_no_cv$updater
)
tmle_no_cv
# -0.1855909

tmle_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_cv, tmle_params_cv,
  targeted_likelihood_cv$updater
)
tmle_cv
# -0.1855909