This repository provides code that implements the state learner. We provide examples of how to use it as a stand-alone tool and in combination with targeted learning.
To run the examples below, run the following code to load the needed functions.
library(here)
library(targets)
tar_source(here("R-code/functions"))
We illustrate how to use the state learner by fitting it to the Melanoma
data
set as provided by the riskRegression
package.
library(riskRegression)
data(Melanoma, package="riskRegression")
setDT(Melanoma)
head(Melanoma)
time status event invasion ici epicel ulcer thick sex age logthick 1: 10 2 death.other.causes level.1 2 present present 6.76 Male 76 1.9110229 2: 30 2 death.other.causes level.0 0 not present not present 0.65 Male 56 -0.4307829 3: 35 0 censored level.1 2 not present not present 1.34 Male 41 0.2926696 4: 99 2 death.other.causes level.0 2 not present not present 2.90 Female 71 1.0647107 5: 185 1 death.malignant.melanoma level.2 2 present present 12.08 Male 52 2.4915512 6: 204 1 death.malignant.melanoma level.2 2 not present present 4.84 Male 28 1.5769147
We specify a list of learners. Each learner is specified through a list with a
model
entry (for now, cox
, GLMnet
, and rfsrc
are implemented), and a
x_form
entry, which provides a formula for which and how the covariates enter
the model. Additional arguments can be supplied to each learner by adding
entries to the list (e.g., ntree
for a random survival forest).
library(glmnet)
library(randomForestSRC)
learners <- list(
cox = list(model = "cox", x_form = ~sex+age+logthick),
cox_penalty = list(model = "GLMnet", x_form = ~invasion+ici+epicel+ulcer+sex+age+logthick),
km = list(model = "cox", x_form = ~1),
km_strat = list(model = "cox", x_form = ~strata(epicel)),
rf = list(model = "rfsrc", x_form = ~invasion+ici+epicel+ulcer+sex+age+logthick, ntree = 50)
)
Below we use the same list of learners to learn the cumulative hazards for cause
1, cause 2, and censoring. The state learner returns a list
consisting of 1) a
data.table
of triples of learners sorted according to their joint performance
in predicting the state of the observed data in the interval 0 to time
, and 2)
the top ranked triple of models fitted to the full data. Here we use an interval
of 5 years and show the 6 highest ranked triples.
set.seed(111)
sl = statelearner(learners = list(cause1 = learners,
cause2 = learners,
censor = learners),
data = Melanoma,
time = 5*365.25)
head(sl$cv_fit)
cause1 cause2 censor loss b 1: rf km cox 239.6142 1 2: rf km cox_penalty 239.8218 1 3: rf km km 239.8678 1 4: rf cox_penalty cox 239.9478 1 5: rf km rf 239.9732 1 6: rf cox_penalty cox_penalty 240.1687 1
We estimate the standardized risk difference between patient with and without ulcers for both death of malignant melanoma (cause=1) and death from other causes (cause=2). To do so, we need an estimate of the “treatment” mechanism, i.e., the probability of the presence of ulcers given other covariates.
Melanoma[,A := as.numeric(ulcer)-1]
treat_fit <- GLMnet(A ~ invasion+ici+epicel+sex+age+logthick, family = binomial, data = Melanoma)
We can now estimate the ATE using the fitted nuisance models.
ate_est <- os_abs_risk_ate(data = Melanoma,
eval_times = c(2,5)*365.25,
fit_1 = sl$fitted_winners$cause1,
fit_2 = sl$fitted_winners$cause2,
fit_cens = sl$fitted_winners$censor,
fit_treat = treat_fit)
Below we show the estimated risk difference for
ate_est[effect=="ATE" & est_type=="one-step"]
cause time effect est_type est see lower upper 1: cause1 730.50 ATE one-step 0.05638786 0.02677092 0.00391686 0.1088589 2: cause2 730.50 ATE one-step 0.08763156 0.08163485 -0.07237275 0.2476359 3: cause1 1826.25 ATE one-step 0.05699237 0.05786330 -0.05641970 0.1704044 4: cause2 1826.25 ATE one-step 0.10442644 0.08226031 -0.05680378 0.2656567
We simulate data according to the Zelefsky study and evaluate different super learners.
set.seed(22)
start_t = 1
end_t = 36
time_inc = (end_t-start_t)/100
eval_times = seq(start_t, end_t, time_inc)
train_data <- simZelefsky_wrapper(n = 500, simulation_input = zelefsky_summary)[, !c("true_time", "cens_time")]
eval_data <- simZelefsky_wrapper(n = 1e4, simulation_input = zelefsky_summary)[, !c("time", "status")]
sim_est <- eval_sl(train_data, eval_data, eval_times = seq(1, 36, length.out = 100))
The scaled integrated Brier scores are obtained as follows.
sim_est[!(type == "cens" & grepl("ipcw", SL)),
.(scaled_int_brier = sum(100*brier)*time_inc/end_t),
.(type, SL)]
type SL scaled_int_brier 1: event survSL 9.951644 2: cens survSL 13.723653 3: event statelearner 9.826543 4: cens statelearner 14.055743 5: event ipcw_km 9.826543 6: event ipcw_cox 9.826543 7: event oracle 9.826543 8: cens oracle 13.725703