plot.cox.zph vs. survminer::ggcoxzph
Closed this issue · 1 comments
Dear @therneau
I plot the result of the cox.zph using the plotting function from the survival and survminer package. I get different results, attached below. Which one is correct?
I am preparing for writing a paper and I cannot risk. That's why I report the inconsistency. Both survival and survminer packages are very popular and I believe many people base their scientific work on them.
data <- structure(list(time = c(1, 7, 8, 41, 80, 85, 11, 61, 8, 2, 98,
5, 4, 42, 23, 44, 5, 29, 13, 4, 7, 19, 9, 6, 5, 13, 6, 1, 30,
38, 57, 87, 8, 22, 7, 16, 58, 6, 18, 55, 93, 15, 21, 28, 81,
34, 10, 79, 2, 66, 55, 10, 6, 62, 58, 2, 42, 22, 5, 54, 57, 23,
42, 10, 9, 27, 9, 5, 5, 40, 6, 11, 37, 101, 25, 108, 76, 38,
13, 9, 41, 17, 11, 99, 27, 15, 2, 71, 7, 75, 36, 12, 74, 18,
74, 16, 44, 107, 12, 41, 9, 18, 3, 11, 100, 42, 20, 15, 65, 11,
53, 58, 20, 6, 12, 81, 53, 44, 5, 2, 26, 57, 80, 50, 22, 79,
41, 12, 15, 42, 25, 64, 24, 83, 66, 18, 59, 14, 8, 20, 20, 12,
30, 40, 11, 42, 42, 17, 44, 7, 2, 62), status = c(1L, 1L, 1L,
1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L,
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
1L, 1L, 1L, 1L, 0L), RNR = structure(c(2L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L,
2L, 1L), .Label = c("A", "B"), class = "factor"), Gender = c("Male",
"Male", "Male", "Male", "Female", "Male", "Male", "Female", "Female",
"Male", "Female", "Male", "Male", "Female", "Male", "Female",
"Male", "Female", "Male", "Male", "Female", "Male", "Male", "Female",
"Female", "Male", "Male", "Male", "Female", "Male", "Male", "Male",
"Male", "Female", "Male", "Male", "Male", "Male", "Male", "Female",
"Female", "Male", "Male", "Male", "Male", "Female", "Male", "Female",
"Female", "Female", "Male", "Male", "Female", "Male", "Female",
"Male", "Male", "Male", "Male", "Female", "Male", "Male", "Male",
"Female", "Male", "Male", "Male", "Male", "Male", "Male", "Female",
"Male", "Male", "Female", "Female", "Female", "Male", "Male",
"Male", "Male", "Female", "Female", "Male", "Female", "Male",
"Male", "Male", "Male", "Female", "Female", "Female", "Male",
"Male", "Male", "Male", "Male", "Male", "Female", "Female", "Male",
"Male", "Male", "Male", "Female", "Female", "Male", "Male", "Female",
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Female",
"Male", "Male", "Female", "Female", "Male", "Female", "Male",
"Male", "Male", "Male", "Female", "Male", "Female", "Female",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Female", "Male", "Male", "Female", "Male", "Female",
"Female", "Female", "Female", "Male", "Male", "Female", "Female"
), AgeBin = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("<65",
">=65"), class = "factor")), row.names = c(NA, -152L), class = "data.frame")
The fit:
fit <- coxph(Surv(time, status) ~ RNR + Gender + AgeBin, data = x)
summary(fit)
Call:
coxph(formula = Surv(time, status) ~ RNR + Gender + AgeBin, data = x)
n= 152, number of events= 108
coef exp(coef) se(coef) z Pr(>|z|)
RNRB 1.1317 3.1008 0.2082 5.434 5.5e-08 ***
GenderMale 0.3592 1.4322 0.2243 1.602 0.109
AgeBin>=65 0.3820 1.4652 0.2023 1.888 0.059 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
RNRB 3.101 0.3225 2.0617 4.664
GenderMale 1.432 0.6982 0.9228 2.223
AgeBin>=65 1.465 0.6825 0.9856 2.178
Concordance= 0.688 (se = 0.025 )
Likelihood ratio test= 43.03 on 3 df, p=2e-09
Wald test = 43.83 on 3 df, p=2e-09
Score (logrank) test = 48.64 on 3 df, p=2e-10
The test of PH:
cox.zph(fit)
chisq df p
RNR 1.215 1 0.27
Gender 2.099 1 0.15
AgeBin 0.367 1 0.54
GLOBAL 4.921 3 0.18
Let's print it:
layout(1:3)
par(mar=c(2,4,1,1))
invisible(sapply(1:3, function(i) plot(cox.zph(fit)[i])))
As we can notice - most of the points lie outside the dashed range.
Now, let's plot it using the survminer package:
And now all the points lie inside the intervals. The intervals look identical and span the same (visually) range.
So one routine has to be wrong. Or both are correct, but a different setting/option is in use. Or maybe I am missing something trivial.
This is the code: https://github.com/kassambara/survminer/blob/master/R/ggcoxzph.R
and, for a convenience, I extracted a few parts responsible for the se:
if (se) {
bk <- backsolve(qmat$qr[1:df, 1:df], diag(df))
xtx <- bk %*% t(bk)
seval <- d * ((pmat %*% xtx) * pmat) %*% rep(1, df)
}
...
lapply(var, function(i) {
invisible(round(x$table[i, 3],4) -> pval)
ggplot() + labs(title = paste0('Schoenfeld Individual Test p: ', pval)) + ggtheme -> gplot
y <- yy[, i]
yhat <- as.vector(pmat %*% qr.coef(qmat, y))
if (resid)
yr <- range(yhat, y)
else yr <- range(yhat)
if (se) {
temp <- as.vector(2 * sqrt(x$var[i, i] * seval))
yup <- yhat + temp
ylow <- yhat - temp
yr <- range(yr, yup, ylow)
}
....
if (se) {
gplot <- gplot + geom_line(aes(x=pred.x, y=yup), lty = "dashed") +
geom_line(aes( x = pred.x, y = ylow), lty = "dashed")
}
Your code is correct. There was an issue in SurvMiner. I'm sorry for the false alarm and closing.
kassambara/survminer#534