Unexpected outputs using `add_global_p()` with `crr()`
fdehrich opened this issue · 2 comments
Hello,
Re-posting this issue from {gtsummary} since it was identified as a better fit for {tidycmprsk}.
When using add_global_p()
with crr()
, I noticed that, in some cases, releveling a 4-level categorical variable (keeping the reference category the same) causes the global p-value to change. More broadly, in these instances, the global p-value appears to consistently match the individual p-value corresponding to the third level for some reason.
I was able to recreate this issue using the tidycmprsk::trial
dataset (I had to add some extra variables and used the death variable as a predictor which doesn't make sense, but helped me recreate for some reason). Below, the global p-value for new_categorical
changes from 0.6 to 0.4 when the variable is releveled (but keeping A
as the reference category).
Thank you!
library(tidycmprsk)
# Set seed as there seems to be some randomness involved in the estimation
set.seed(123)
# Update trial data to include additional variables
trial_updated =
trial %>%
dplyr::mutate(
new_binary = sample(c(0, 1),
replace = TRUE,
size = nrow(trial)),
new_categorical = sample(c("A", "B", "C", "D"),
replace = TRUE,
size = nrow(trial))
)
# Generate table
crr(Surv(ttdeath, death_cr) ~ .,
failcode = "death from cancer",
cencode = "censor",
trial_updated) %>%
gtsummary::tbl_regression(exp = TRUE) %>%
gtsummary::add_global_p(keep = TRUE) %>%
gtsummary::as_kable()
#> 27 cases omitted due to missing values
Characteristic | HR | 95% CI | p-value |
---|---|---|---|
Chemotherapy Treatment | 0.7 | ||
Drug A | — | — | |
Drug B | 1.32 | 0.67, 2.59 | 0.4 |
Age | 1.01 | 0.98, 1.03 | 0.4 |
Marker Level (ng/mL) | 1.20 | 0.81, 1.77 | 0.5 |
T Stage | 0.2 | ||
T1 | — | — | |
T2 | 0.68 | 0.23, 2.01 | 0.5 |
T3 | 1.88 | 0.75, 4.68 | 0.2 |
T4 | 1.10 | 0.45, 2.69 | 0.8 |
Grade | 0.4 | ||
I | — | — | |
II | 1.90 | 0.86, 4.19 | 0.11 |
III | 0.97 | 0.49, 1.94 | >0.9 |
Tumor Response | 0.88 | 0.42, 1.83 | <0.001 |
Patient Died | 113,638 | 66,593, 193,919 | 0.8 |
new_binary | 1.10 | 0.62, 1.95 | 0.4 |
new_categorical | 0.6 | ||
A | — | — | |
B | 1.49 | 0.62, 3.56 | 0.4 |
C | 0.77 | 0.30, 1.99 | 0.6 |
D | 0.96 | 0.41, 2.27 | >0.9 |
# Relevel `new_categorical` variable
trial_updated =
trial_updated %>%
dplyr::mutate(
new_categorical = forcats::fct_relevel(
new_categorical,
"A", "C", "B", "D"
)
)
# Regenerate table
crr(Surv(ttdeath, death_cr) ~ .,
failcode = "death from cancer",
cencode = "censor",
trial_updated) %>%
gtsummary::tbl_regression(exp = TRUE) %>%
gtsummary::add_global_p(keep = TRUE) %>%
gtsummary::as_kable()
#> 27 cases omitted due to missing values
Characteristic | HR | 95% CI | p-value |
---|---|---|---|
Chemotherapy Treatment | 0.7 | ||
Drug A | — | — | |
Drug B | 1.32 | 0.67, 2.59 | 0.4 |
Age | 1.01 | 0.98, 1.03 | 0.4 |
Marker Level (ng/mL) | 1.20 | 0.81, 1.77 | 0.5 |
T Stage | 0.2 | ||
T1 | — | — | |
T2 | 0.68 | 0.23, 2.01 | 0.5 |
T3 | 1.88 | 0.75, 4.68 | 0.2 |
T4 | 1.10 | 0.45, 2.69 | 0.8 |
Grade | 0.4 | ||
I | — | — | |
II | 1.90 | 0.86, 4.19 | 0.11 |
III | 0.97 | 0.49, 1.94 | >0.9 |
Tumor Response | 0.88 | 0.42, 1.83 | <0.001 |
Patient Died | 113,638 | 66,593, 193,919 | 0.8 |
new_binary | 1.10 | 0.62, 1.95 | 0.6 |
new_categorical | 0.4 | ||
A | — | — | |
C | 0.77 | 0.30, 1.99 | 0.6 |
B | 1.49 | 0.62, 3.56 | 0.4 |
D | 0.96 | 0.41, 2.27 | >0.9 |
Created on 2023-04-25 with reprex v2.0.2
Confirmed that the individual p-values presented in tdycmprsk::crr() %>% gtsummary::tbl_regression()
match those in the underlying cmprsk::crr
object ✅
library(tidycmprsk)
# Set seed for data generation
set.seed(123)
# Update trial data to include additional variables
trial_updated =
trial %>%
dplyr::mutate(
new_binary = sample(c(0, 1),
replace = TRUE,
size = nrow(trial)),
new_categorical = sample(c("A", "B", "C", "D"),
replace = TRUE,
size = nrow(trial))
)
# Fit model
fit =
tidycmprsk::crr(Surv(ttdeath, death_cr) ~ .,
failcode = "death from cancer",
cencode = "censor",
trial_updated)
#> 27 cases omitted due to missing values
# Display individual p-values from tidycmprsk
fit %>%
gtsummary::tbl_regression(exp = TRUE) %>%
gtsummary::as_kable()
Characteristic | HR | 95% CI | p-value |
---|---|---|---|
Chemotherapy Treatment | |||
Drug A | — | — | |
Drug B | 1.32 | 0.67, 2.59 | 0.4 |
Age | 1.01 | 0.98, 1.03 | 0.6 |
Marker Level (ng/mL) | 1.20 | 0.81, 1.77 | 0.4 |
T Stage | |||
T1 | — | — | |
T2 | 0.68 | 0.23, 2.01 | 0.5 |
T3 | 1.88 | 0.75, 4.68 | 0.2 |
T4 | 1.10 | 0.45, 2.69 | 0.8 |
Grade | |||
I | — | — | |
II | 1.90 | 0.86, 4.19 | 0.11 |
III | 0.97 | 0.49, 1.94 | >0.9 |
Tumor Response | 0.88 | 0.42, 1.83 | 0.7 |
Patient Died | 113,638 | 66,593, 193,919 | <0.001 |
new_binary | 1.10 | 0.62, 1.95 | 0.8 |
new_categorical | |||
A | — | — | |
B | 1.49 | 0.62, 3.56 | 0.4 |
C | 0.77 | 0.30, 1.99 | 0.6 |
D | 0.96 | 0.41, 2.27 | >0.9 |
# Display individual p-values from from cmprsk::crr object stored internally
fit$cmprsk
#> convergence: FALSE
#> coefficients:
#> trtDrug B age marker stageT2
#> 0.274200 0.007546 0.178700 -0.391300
#> stageT3 stageT4 gradeII gradeIII
#> 0.630400 0.098060 0.640000 -0.026930
#> response death new_binary new_categoricalB
#> -0.127100 11.640000 0.091740 0.399100
#> new_categoricalC new_categoricalD
#> -0.256000 -0.040160
#> standard errors:
#> [1] 0.34480 0.01264 0.20110 0.55500 0.46570 0.45500 0.40390 0.35240 0.37190
#> [10] 0.27270 0.29380 0.44400 0.48200 0.43840
#> two-sided p-values:
#> trtDrug B age marker stageT2
#> 0.43 0.55 0.37 0.48
#> stageT3 stageT4 gradeII gradeIII
#> 0.18 0.83 0.11 0.94
#> response death new_binary new_categoricalB
#> 0.73 0.00 0.75 0.37
#> new_categoricalC new_categoricalD
#> 0.60 0.93
Created on 2023-04-27 with reprex v2.0.2
Opening a more specific issue