therneau/survival

plot.cox.zph vs. survminer::ggcoxzph

Closed this issue · 1 comments

Dear @therneau
I plot the result of the cox.zph using the plotting function from the survival and survminer package. I get different results, attached below. Which one is correct?

I am preparing for writing a paper and I cannot risk. That's why I report the inconsistency. Both survival and survminer packages are very popular and I believe many people base their scientific work on them.

data <- structure(list(time = c(1, 7, 8, 41, 80, 85, 11, 61, 8, 2, 98, 
5, 4, 42, 23, 44, 5, 29, 13, 4, 7, 19, 9, 6, 5, 13, 6, 1, 30, 
38, 57, 87, 8, 22, 7, 16, 58, 6, 18, 55, 93, 15, 21, 28, 81, 
34, 10, 79, 2, 66, 55, 10, 6, 62, 58, 2, 42, 22, 5, 54, 57, 23, 
42, 10, 9, 27, 9, 5, 5, 40, 6, 11, 37, 101, 25, 108, 76, 38, 
13, 9, 41, 17, 11, 99, 27, 15, 2, 71, 7, 75, 36, 12, 74, 18, 
74, 16, 44, 107, 12, 41, 9, 18, 3, 11, 100, 42, 20, 15, 65, 11, 
53, 58, 20, 6, 12, 81, 53, 44, 5, 2, 26, 57, 80, 50, 22, 79, 
41, 12, 15, 42, 25, 64, 24, 83, 66, 18, 59, 14, 8, 20, 20, 12, 
30, 40, 11, 42, 42, 17, 44, 7, 2, 62), status = c(1L, 1L, 1L, 
1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L), RNR = structure(c(2L, 2L, 1L, 2L, 1L, 1L, 
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 
2L, 1L), .Label = c("A", "B"), class = "factor"), Gender = c("Male", 
"Male", "Male", "Male", "Female", "Male", "Male", "Female", "Female", 
"Male", "Female", "Male", "Male", "Female", "Male", "Female", 
"Male", "Female", "Male", "Male", "Female", "Male", "Male", "Female", 
"Female", "Male", "Male", "Male", "Female", "Male", "Male", "Male", 
"Male", "Female", "Male", "Male", "Male", "Male", "Male", "Female", 
"Female", "Male", "Male", "Male", "Male", "Female", "Male", "Female", 
"Female", "Female", "Male", "Male", "Female", "Male", "Female", 
"Male", "Male", "Male", "Male", "Female", "Male", "Male", "Male", 
"Female", "Male", "Male", "Male", "Male", "Male", "Male", "Female", 
"Male", "Male", "Female", "Female", "Female", "Male", "Male", 
"Male", "Male", "Female", "Female", "Male", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Male", 
"Male", "Male", "Male", "Male", "Male", "Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Male", "Male", "Female", 
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Female", 
"Male", "Male", "Female", "Female", "Male", "Female", "Male", 
"Male", "Male", "Male", "Female", "Male", "Female", "Female", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Female", "Male", "Male", "Female", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Female", "Female"
), AgeBin = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("<65", 
">=65"), class = "factor")), row.names = c(NA, -152L), class = "data.frame")

The fit:

fit   <- coxph(Surv(time, status) ~ RNR + Gender + AgeBin, data = x)
summary(fit)

Call:
coxph(formula = Surv(time, status) ~ RNR + Gender + AgeBin, data = x)

  n= 152, number of events= 108 

             coef exp(coef) se(coef)     z Pr(>|z|)    
RNRB       1.1317    3.1008   0.2082 5.434  5.5e-08 ***
GenderMale 0.3592    1.4322   0.2243 1.602    0.109    
AgeBin>=65 0.3820    1.4652   0.2023 1.888    0.059 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

           exp(coef) exp(-coef) lower .95 upper .95
RNRB           3.101     0.3225    2.0617     4.664
GenderMale     1.432     0.6982    0.9228     2.223
AgeBin>=65     1.465     0.6825    0.9856     2.178

Concordance= 0.688  (se = 0.025 )
Likelihood ratio test= 43.03  on 3 df,   p=2e-09
Wald test            = 43.83  on 3 df,   p=2e-09
Score (logrank) test = 48.64  on 3 df,   p=2e-10

The test of PH:

cox.zph(fit)

       chisq df    p
RNR    1.215  1 0.27
Gender 2.099  1 0.15
AgeBin 0.367  1 0.54
GLOBAL 4.921  3 0.18

Let's print it:

layout(1:3)
par(mar=c(2,4,1,1))
invisible(sapply(1:3, function(i) plot(cox.zph(fit)[i])))

obraz

As we can notice - most of the points lie outside the dashed range.

Now, let's plot it using the survminer package:

obraz

And now all the points lie inside the intervals. The intervals look identical and span the same (visually) range.
So one routine has to be wrong. Or both are correct, but a different setting/option is in use. Or maybe I am missing something trivial.

This is the code: https://github.com/kassambara/survminer/blob/master/R/ggcoxzph.R

and, for a convenience, I extracted a few parts responsible for the se:

if (se) {
    bk <- backsolve(qmat$qr[1:df, 1:df], diag(df))
    xtx <- bk %*% t(bk)
    seval <- d * ((pmat %*% xtx) * pmat) %*% rep(1, df)
  }

...
lapply(var, function(i) {
    invisible(round(x$table[i, 3],4) -> pval)
    ggplot() + labs(title = paste0('Schoenfeld Individual Test p: ', pval)) + ggtheme -> gplot
    y <- yy[, i]
    yhat <- as.vector(pmat %*% qr.coef(qmat, y))
    if (resid)
      yr <- range(yhat, y)
    else yr <- range(yhat)
    if (se) {
      temp <- as.vector(2 * sqrt(x$var[i, i] * seval))
      yup <- yhat + temp
      ylow <- yhat - temp
      yr <- range(yr, yup, ylow)
    }
....
 if (se) {
      gplot <- gplot + geom_line(aes(x=pred.x, y=yup), lty = "dashed") +
        geom_line(aes( x = pred.x, y = ylow), lty = "dashed")
    }

Your code is correct. There was an issue in SurvMiner. I'm sorry for the false alarm and closing.
kassambara/survminer#534