Predictions for survival analysis based on median time
Opened this issue · 2 comments
Hi,
I have a similar issue - and I don't think it can be solved on the x side.
In my case, I have a parametric survival model (estimated using 'survreg' from the survival package), and what I want is the marginal predicted median survival time, not the marginal predicted mean survival time. In Stata, the margins command allows this fairly easily, with different functions for the predictions. From the Stata manual:
margins, predict(median time)
gives predictions based on:
margins, predict(mean time)
gives predictions based on:
I can also get median predictions using the 'predict.survreg' function:
predict(fit,type="quantile",p=0.5).
But that won't easily give me marginal (counterfactual) SEs.
And it doesn't look like I can pass the 'p' argument through when using ggeffects. I can pass the type="quantile" argument in ggaverage, but it estimates the 10th and 90th percentiles (the default). When I try to pass p=0.5 through ggeffects, it gives the error:
Error in sanity_equivalence_p_adjust(equivalence, p_adjust) :
Assertion on 'p_adjust' failed: Must be element of set {'holm','hochberg','hommel','bonferroni','BH','BY','fdr','none'}, but types do not match (numeric != character).
Is there any way to pass that 'p=0.5' through?
Originally posted by @philipclare in #553 (comment)
Do you have a reproducible example? Not quite sure what you would like to achieve. Predicted values when time is split at the median?
I can demonstrate using the 'lung' dataset, included in the survival package.
library(survival)
head(lung)
inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1 3 306 2 74 1 1 90 100 1175 NA
2 3 455 2 68 1 0 90 90 1225 15
3 3 1010 1 56 1 0 90 90 NA 15
4 5 210 2 57 1 1 90 60 1150 11
5 1 883 2 60 1 0 100 90 NA 0
6 12 1022 1 74 1 1 50 80 513 0
If I load that data into Stata, run a fairly simple model model, and then ask for marginal predictions of median survival time over sex, I get:
. streg phecog age i.sex, distr(weibull) time nolog
Failure _d: status
Analysis time _t: time
Weibull AFT regression
No. of subjects = 227 Number of obs = 227
No. of failures = 164
Time at risk = 69,522
LR chi2(3) = 29.98
Log likelihood = -262.47304 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
phecog | -.3396381 .0834784 -4.07 0.000 -.5032528 -.1760234
age | -.0074754 .0067635 -1.11 0.269 -.0207317 .0057808
2.sex | .4010905 .1237326 3.24 0.001 .1585792 .6436019
_cons | 6.674526 .4274013 15.62 0.000 5.836835 7.512217
-------------+----------------------------------------------------------------
/ln_p | .3131927 .0613465 5.11 0.000 .1929559 .4334296
-------------+----------------------------------------------------------------
p | 1.367785 .0839088 1.212829 1.542539
1/p | .731109 .0448509 .6482819 .8245183
------------------------------------------------------------------------------
.
. margins i.sex, predict(median time)
Predictive margins Number of obs = 227
Model VCE: OIM
Expression: Predicted median _t, predict(median time)
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
1 | 284.5261 21.71691 13.10 0.000 241.9618 327.0905
2 | 424.9263 44.34047 9.58 0.000 338.0206 511.832
------------------------------------------------------------------------------
If I run the same model in R using survreg, and then ask for margins from predict_response, I get:
> fit <- survreg(Surv(time, status) ~ phecog + age + sex, lung,dist="weibull")
> summary(fit)
Call:
survreg(formula = Surv(time, status) ~ phecog + age + sex, data = lung,dist = "weibull")
Value Std. Error z p
(Intercept) 6.27344 0.45358 13.83 < 2e-16
phecog -0.33964 0.08348 -4.07 4.7e-05
age -0.00748 0.00676 -1.11 0.2690
sex 0.40109 0.12373 3.24 0.0012
Log(scale) -0.31319 0.06135 -5.11 3.3e-07
Scale= 0.731
Weibull distribution
Loglik(model)= -1132.4 Loglik(intercept only)= -1147.4
Chisq= 29.98 on 3 degrees of freedom, p= 1.4e-06
Number of Newton-Raphson Iterations: 5
n=227 (1 observation deleted due to missingness)
> predict_response(fit,
+ terms="sex",
+ margin="counterfactual")
# Average predicted values of Surv(time, status)
sex | Predicted | 95% CI
--------------------------------
1 | 371.96 | 309.17, 434.75
2 | 555.50 | 503.87, 607.13
Which gives exactly the same model fit/parameters, but different marginal predictions, because the underlying prediction is not the median survival time.
I can get the same predictions in R, using base predict with different 'newdata':
> newdat_m <- data.frame(cbind(time=lung$time,status=lung$status,phecog=lung$phecog,age=lung$age,sex=1))
> mean(predict(fit,newdata=newdat_m,type="quantile",p=0.5),na.rm=TRUE)
[1] 284.5261
> newdat_f <- data.frame(cbind(time=lung$time,status=lung$status,phecog=lung$phecog,age=lung$age,sex=2))
> mean(predict(fit,newdata=newdat_f,type="quantile",p=0.5),na.rm=TRUE)
[1] 424.9263
I can also get quantile marginal predictions from predict_response - but only the 'default' quantiles (10% and 90%):
> predict_response(fit,
+ terms="sex",
+ margin="counterfactual",
+ type="quantile")
# Average predicted values of Surv(time, status)
Surv(time, status): 1
sex | Predicted | 95% CI
---------------------------------
1 | 71.77 | 64.65, 78.90
2 | 107.19 | 101.98, 112.40
Surv(time, status): 2
sex | Predicted | 95% CI
---------------------------------
1 | 684.41 | 576.70, 792.11
2 | 1022.13 | 898.18, 1146.08
> colMeans(predict(fit,newdata=newdat_m,type="quantile"),na.rm=TRUE)
[1] 71.7740 684.4083
> colMeans(predict(fit,newdata=newdat_f,type="quantile"),na.rm=TRUE)
[1] 107.1911 1022.1313
So it seems like all I need is to be able to pass that 'p=0.5' option that I would use in 'predict.survreg' through 'predict_response'. But when I try that, it just ignores it.