rvlenth/emmeans

Emmeans no longer recognizes stratified factors in Survival

mdjpe opened this issue · 13 comments

I sent the below to Terry Therneau (maintainer of the survival package) and he gave an unhelpful response:

"Why are you asking me about a failure in another package? Ask the author of emmeans."

As an ordinary user, all I know is that script which used to work doesn't work now. Could you take a look?

Survival models with a stratified factor appear to have stopped cooperating with emmeans.

For example, the following model, which was fully functional on R for Windows last year (and was used in my published work), currently produces errors when run in R for Ubuntu (I don't currently have R on a Windows machine, so can't check that):

cphfr <- coxph(Surv(tstart, tstop, death) ~ strata(Region) * Patties + Fume, newBHPsurv, cluster=ColonyGroup)

emm <- emmeans(cphfr, ~ Patties | Region)
seem <- summary(pairs(emm, reverse = T), by = NULL, type="response", adjust = "none", infer = c(T, T))
names(seem)[1] <- "Patties effect"
pander(seem)

Error in emmeans(cphfr, ~Patties | Region) :
No variable named Region in the reference grid
In addition: Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(Region)' is absent, its contrast will be ignored

The above definitely worked in:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

survival_3.4-0 emmeans_1.8.2

And is not working in:

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
[1] survival_3.5-7 emmeans_1.10.0

There was a change in emmeans version 1.8.8 in relation to issue #429. It seems to be exactly the reverse of what you are reporting. Can you review that issue please, and let me know how yours differs?

Note

I edited and deleted from your initial submission my instructions for reporting issues. The idea is for you to use the same sectioning (the prompts that start with ##) and replace the instructions with your comments relevant to those instructions. I would appreciate your following that practice in the future.

PS -- I cannot reproduce your issue because I do not have the dataset newBPHsurv. My issue guidelines clearly say that I need a reproducible example. So I will close this issue until you re-open it and either provide the dataset or prepare a similar example using data I can access, e.g. a dataset in the survey package. (Please re-open this issue; do not create a new issue.)

I don't see a re-open button, so maybe the comments section will do.

Here is a reproducible example (cox model borrowed from Therneau: "A package for survival analysis in R". My copy is dated Aug 5, 2022.):

require(survival)
Loading required package: survival
require(emmeans)
Loading required package: emmeans

cfit2 <- coxph(Surv(time, status) ~ age + sex + wt.loss + strata(inst), data=lung)

emm <- emmeans(cfit2, ~ inst)
Error in emmeans(cfit2, ~inst) :
No variable named inst in the reference grid

Specifying strata(inst) instead does not help:

emm <- emmeans(cfit2, ~ strata(inst))
Error in emmeans(cfit2, ~strata(inst)) :
No variable named inst in the reference grid

Other variables in the model produce the expected results:

emm <- emmeans(cfit2, ~ wt.loss)
emm
wt.loss emmean SE df asymp.LCL asymp.UCL
9.78 0.679 0.736 Inf -0.763 2.12

Replacing strata(inst) with inst in the model allows emmeans to be generated (but presumably violates the proportional hazards assumption)

cfit222 <- coxph(Surv(time, status) ~ age + sex + wt.loss + inst, data=lung)
emm <- emmeans(cfit222, ~ inst)
emm
inst emmean SE df asymp.LCL asymp.UCL
11 0.41 0.679 Inf -0.92 1.74

My own case requires an interaction between a treatment and a stratified factor. Emmeans are not available for either the stratified factor or the interacting treatment. For example,

cfit222dd <- coxph(Surv(time, status) ~ strata(sex)*age, data=lung)
emm <- emmeans(cfit222dd, ~ sex)

Error in emmeans(cfit222dd, ~sex) :
No variable named sex in the reference grid
In addition: Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(sex)' is absent, its contrast will be ignored

emm <- emmeans(cfit222dd, ~ age)
Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(sex)' is absent, its contrast will be ignored
emm
Error in X[ii, ii, drop = FALSE] %*% y[ii] : non-conformable arguments

Below is the full output from sessionInfo()

sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Winnipeg
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] emmeans_1.10.0 survival_3.5-7

loaded via a namespace (and not attached):
[1] compiler_4.3.3 Matrix_1.6-5 estimability_1.4.1 tools_4.3.3
[5] coda_0.19-4.1 mvtnorm_1.2-4 splines_4.3.3 grid_4.3.3
[9] xtable_1.8-4 lattice_0.22-5

An issue can be reopened via the "Reopen with comment" button just next to the "Comment" button below the comment compose window. Similarly, the issue can be closed with the "Close with comment" button. I think that since you are listed as a "participant" on this issue, I think that you can reopen it; and I think I have experienced this happening with past issue submitters. But maybe I'm wrong.

I asked you to look at issue #429, but you offer no comment on that. To tell the truth, I am surprised that your example worked in the past, due to that very bug. I suppose it has to do with the interaction. And if so (see below), some coeffficients are not available, and it's possible that what seemed to "work" in fact didn't, so that the results obtained are nonsense.

This seems to be a no-win situation. I fixed #429 on July 10, 2023 by disallowing strata factors. Restoring strata factors will possibly fix the problem here, but will revive the bug in #429.

Meanwhile, I've looked at various articles on the web about strata in Cox regression, and several of them say that you cannot do inference about the stratification factors. So that, along with other considerations, suggests to me that it is correct to not allow stratification factors in the reference grid. My intuition suggests that strata are something akin to random block effects in a linear model. This seems to be borne-out by the coefficient of inst not being included in the summary:

> summary(cfit2)
Call:
coxph(formula = Surv(time, status) ~ age + sex + wt.loss + strata(inst), 
    data = lung)

  n= 213, number of events= 151 
   (15 observations deleted due to missingness)

             coef exp(coef)  se(coef)      z Pr(>|z|)
age      0.023535  1.023814  0.010415  2.260  0.02383
sex     -0.516036  0.596882  0.190617 -2.707  0.00679
wt.loss -0.001698  0.998303  0.006848 -0.248  0.80413

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0238     0.9767    1.0031    1.0449
sex        0.5969     1.6754    0.4108    0.8672
wt.loss    0.9983     1.0017    0.9850    1.0118

Concordance= 0.637  (se = 0.034 )
Likelihood ratio test= 14.06  on 3 df,   p=0.003
Wald test            = 13.32  on 3 df,   p=0.004
Score (logrank) test = 13.67  on 3 df,   p=0.003

For the other model, we get

> summary(cfit222dd)
Call:
coxph(formula = Surv(time, status) ~ strata(sex) * age, data = lung)

  n= 228, number of events= 165 

                          coef exp(coef)  se(coef)      z Pr(>|z|)
age                   0.019064  1.019247  0.011353  1.679   0.0931
strata(sex)sex=2:age -0.008353  0.991682  0.019336 -0.432   0.6657

                     exp(coef) exp(-coef) lower .95 upper .95
age                     1.0192     0.9811    0.9968     1.042
strata(sex)sex=2:age    0.9917     1.0084    0.9548     1.030

Concordance= 0.546  (se = 0.026 )
Likelihood ratio test= 3.37  on 2 df,   p=0.2
Wald test            = 3.29  on 2 df,   p=0.2
Score (logrank) test = 3.29  on 2 df,   p=0.2

This summary does include coefficients for the interaction, but not a main effect of strata(sex). But I wonder if this is just an oversight, or if this model makes sense. I don't know.

Adding further confusion is that strata() is just a function, and though its documentation says that it is designed for use with coxph models, we could in fact use it with any model, e.g. one fitted with lm():

> junk <- lm(time ~ age + sex + wt.loss + strata(inst), data = lung)
> emmeans(junk, "sex")
 sex emmean   SE  df lower.CL upper.CL
   1    276 25.3 192      226      326
   2    336 26.3 192      284      388
>    . . .
> junk2 <- lm(time ~ age + sex + wt.loss + inst, data = lung)
> emmeans(junk2, "sex")
 sex emmean   SE  df lower.CL upper.CL
   1    294 19.0 208      256      331
   2    344 23.2 208      298      390

The model junk2 makes some sense, at least mechanically, whereas junk itself is allowed, but doesn't make sense and yields different results. Something like that could explain why I was at first mistaken in allowing stratification factors as factors of inferential interest.

The bottom line

I cannot allow stratification factors as factors of inferential interest in Cox models, because even if it's appropriate to do so in the thinking of some users, we don't have regression coefficients, nor variances and covariances for them. I do not know why it seemed to work in earlier versions, but I am guessing that those results are suspect.

OK, here's the deal. I don't know what's right -- even after going through the user manual for survival and looking at all 136 cases where the word "strata" is used (many, to my irritation, being grammatical errors where the author meant "stratum"). So I decided to play along with it. That means allowing any factors in strata in the reference grid but deleting columns of the X matrix if there are no corresponding coefficients. So now, everything runs without error, as demonstrated below. But please note that we can have no main effect of a strata() factor, so if you have strata(x) as an additive term in a model, you get the same results for each level of x -- and if you compare levels of x, you'll get zeros.

The changes I made also side-step the bug observed in #429.

Just keep in mind that I'm not keeping you from doing anything that might be nonsensical -- it's on you to do appropriate analyses.

## What the hell to do about `strata()`?

library(survival)
library(emmeans)

#### Additive model
moda <- coxph(Surv(time, status) ~ wt.loss + age + strata(sex), data = lung)
#### Interaction model
modi <- coxph(Surv(time, status) ~ wt.loss + age*strata(sex), data = lung)
### survreg additive model
sra <- survreg(Surv(time, status) ~ wt.loss + age + strata(sex), data = lung)
### survreg interactive model
sri <- survreg(Surv(time, status) ~ wt.loss + age * strata(sex), data = lung)

at = list(age = c(20,30))

emmeans(moda, ~ age*sex, at = at)
##  age sex emmean    SE  df asymp.LCL asymp.UCL
##   20   1  0.386 0.203 Inf  -0.01142     0.783
##   30   1  0.578 0.296 Inf  -0.00178     1.158
##   20   2  0.386 0.203 Inf  -0.01142     0.783
##   30   2  0.578 0.296 Inf  -0.00178     1.158
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(modi, ~ age*sex, at = at)
##  age sex emmean    SE  df asymp.LCL asymp.UCL
##   20   1  0.459 0.247 Inf   -0.0255     0.943
##   30   1  0.688 0.364 Inf   -0.0259     1.402
##   20   2  0.247 0.332 Inf   -0.4042     0.897
##   30   2  0.369 0.492 Inf   -0.5956     1.335
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(sra, ~ age*sex, at = at)
##  age sex emmean    SE  df lower.CL upper.CL
##   20   1   6.70 0.311 209     6.09     7.32
##   30   1   6.56 0.242 209     6.08     7.04
##   20   2   6.70 0.311 209     6.09     7.32
##   30   2   6.56 0.242 209     6.08     7.04
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(sri, ~ age*sex, at = at)
##  age sex emmean    SE  df lower.CL upper.CL
##   20   1   6.58 0.310 208     5.97     7.20
##   30   1   6.43 0.243 208     5.95     6.91
##   20   2   6.70 0.308 208     6.09     7.31
##   30   2   6.61 0.241 208     6.13     7.08
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

Created on 2024-04-04 with reprex v2.0.2

Hi, Dr. Lenth. Following your revisions, I have run the script from my previous work and it returns the results that it used to, which is correct according to my understanding. That is, it gives estimates for the non-stratified factor at each level of the stratified factor, and they are the same estimates as previously. Thank you for your help.

Hi, Dr. Lenth. I recently reinstalled R, emmeans, and survival, and this problem seems to have come back. Once again, emmeans is not producing the expected results with stratified factors in Survival.

Reproducible example:

require(survival)
require(emmeans)

cfit2 <- coxph(Surv(time, status) ~ age + sex + wt.loss + strata(inst),
data=lung)
emm <- emmeans(cfit2, ~ inst)

Error in emmeans(cfit2, ~inst) :
No variable named inst in the reference grid

This case is correct since the model doesn't produce an estimate for inst (or strata(inst)).

However, a model such as

cfit222dd <- coxph(Surv(time, status) ~ strata(sex)*age, data=lung)
emm <- emmeans(cfit222dd, ~ age)
emm <- emmeans(cfit222dd, ~ age | sex)

should produce estimates for age (upper line) or for age at each level of sex (lower line). Instead, it gives

Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(sex)' is absent, its contrast will be ignored

emm
Error in X[ii, ii, drop = FALSE] %*% y[ii] : non-conformable arguments

This appears to be exactly the same problem that you sorted for me back in the spring; I am surprised to see it come back since the version of emmeans (1.10.0) has not changed. I did recently re-install it. However, I am now running R 4.3.3 and survival_3.7-0; perhaps recent updates to one of those have caused a problem.

Don't know what to tell you. All this code works fine for me.

I suggest quitting R and starting over with a new session.