MSKCC-Epi-Bio/tidycmprsk

Unexpected outputs using `add_global_p()` with `crr()`

fdehrich opened this issue · 2 comments

Hello,

Re-posting this issue from {gtsummary} since it was identified as a better fit for {tidycmprsk}.

When using add_global_p() with crr(), I noticed that, in some cases, releveling a 4-level categorical variable (keeping the reference category the same) causes the global p-value to change. More broadly, in these instances, the global p-value appears to consistently match the individual p-value corresponding to the third level for some reason.

I was able to recreate this issue using the tidycmprsk::trial dataset (I had to add some extra variables and used the death variable as a predictor which doesn't make sense, but helped me recreate for some reason). Below, the global p-value for new_categorical changes from 0.6 to 0.4 when the variable is releveled (but keeping A as the reference category).

Thank you!

  library(tidycmprsk)

# Set seed as there seems to be some randomness involved in the estimation
set.seed(123)
  
# Update trial data to include additional variables
trial_updated =
  trial %>% 
  dplyr::mutate(
    new_binary = sample(c(0, 1), 
                        replace = TRUE, 
                        size = nrow(trial)),
    new_categorical = sample(c("A", "B", "C", "D"), 
                             replace = TRUE, 
                             size = nrow(trial))
  )

# Generate table
crr(Surv(ttdeath, death_cr) ~ ., 
    failcode = "death from cancer",
    cencode = "censor",
    trial_updated) %>%
  gtsummary::tbl_regression(exp = TRUE) %>%
  gtsummary::add_global_p(keep = TRUE) %>% 
  gtsummary::as_kable()
#> 27 cases omitted due to missing values
Characteristic HR 95% CI p-value
Chemotherapy Treatment 0.7
Drug A
Drug B 1.32 0.67, 2.59 0.4
Age 1.01 0.98, 1.03 0.4
Marker Level (ng/mL) 1.20 0.81, 1.77 0.5
T Stage 0.2
T1
T2 0.68 0.23, 2.01 0.5
T3 1.88 0.75, 4.68 0.2
T4 1.10 0.45, 2.69 0.8
Grade 0.4
I
II 1.90 0.86, 4.19 0.11
III 0.97 0.49, 1.94 >0.9
Tumor Response 0.88 0.42, 1.83 <0.001
Patient Died 113,638 66,593, 193,919 0.8
new_binary 1.10 0.62, 1.95 0.4
new_categorical 0.6
A
B 1.49 0.62, 3.56 0.4
C 0.77 0.30, 1.99 0.6
D 0.96 0.41, 2.27 >0.9
# Relevel `new_categorical` variable
trial_updated = 
  trial_updated %>% 
  dplyr::mutate(
    new_categorical = forcats::fct_relevel(
      new_categorical,
      "A", "C", "B", "D"
    )
  )

# Regenerate table
crr(Surv(ttdeath, death_cr) ~ ., 
    failcode = "death from cancer",
    cencode = "censor",
    trial_updated) %>%
  gtsummary::tbl_regression(exp = TRUE) %>%
  gtsummary::add_global_p(keep = TRUE) %>% 
  gtsummary::as_kable()
#> 27 cases omitted due to missing values
Characteristic HR 95% CI p-value
Chemotherapy Treatment 0.7
Drug A
Drug B 1.32 0.67, 2.59 0.4
Age 1.01 0.98, 1.03 0.4
Marker Level (ng/mL) 1.20 0.81, 1.77 0.5
T Stage 0.2
T1
T2 0.68 0.23, 2.01 0.5
T3 1.88 0.75, 4.68 0.2
T4 1.10 0.45, 2.69 0.8
Grade 0.4
I
II 1.90 0.86, 4.19 0.11
III 0.97 0.49, 1.94 >0.9
Tumor Response 0.88 0.42, 1.83 <0.001
Patient Died 113,638 66,593, 193,919 0.8
new_binary 1.10 0.62, 1.95 0.6
new_categorical 0.4
A
C 0.77 0.30, 1.99 0.6
B 1.49 0.62, 3.56 0.4
D 0.96 0.41, 2.27 >0.9

Created on 2023-04-25 with reprex v2.0.2

Confirmed that the individual p-values presented in tdycmprsk::crr() %>% gtsummary::tbl_regression() match those in the underlying cmprsk::crr object ✅

library(tidycmprsk)

# Set seed for data generation
set.seed(123)

# Update trial data to include additional variables
trial_updated =
  trial %>% 
  dplyr::mutate(
    new_binary = sample(c(0, 1), 
                        replace = TRUE, 
                        size = nrow(trial)),
    new_categorical = sample(c("A", "B", "C", "D"), 
                             replace = TRUE, 
                             size = nrow(trial))
  )

# Fit model
fit = 
  tidycmprsk::crr(Surv(ttdeath, death_cr) ~ ., 
                  failcode = "death from cancer",
                  cencode = "censor",
                  trial_updated)
#> 27 cases omitted due to missing values

# Display individual p-values from tidycmprsk
fit %>%
  gtsummary::tbl_regression(exp = TRUE) %>%
  gtsummary::as_kable()
Characteristic HR 95% CI p-value
Chemotherapy Treatment
Drug A
Drug B 1.32 0.67, 2.59 0.4
Age 1.01 0.98, 1.03 0.6
Marker Level (ng/mL) 1.20 0.81, 1.77 0.4
T Stage
T1
T2 0.68 0.23, 2.01 0.5
T3 1.88 0.75, 4.68 0.2
T4 1.10 0.45, 2.69 0.8
Grade
I
II 1.90 0.86, 4.19 0.11
III 0.97 0.49, 1.94 >0.9
Tumor Response 0.88 0.42, 1.83 0.7
Patient Died 113,638 66,593, 193,919 <0.001
new_binary 1.10 0.62, 1.95 0.8
new_categorical
A
B 1.49 0.62, 3.56 0.4
C 0.77 0.30, 1.99 0.6
D 0.96 0.41, 2.27 >0.9
# Display individual p-values from from cmprsk::crr object stored internally
fit$cmprsk
#> convergence:  FALSE 
#> coefficients:
#>        trtDrug B              age           marker          stageT2 
#>         0.274200         0.007546         0.178700        -0.391300 
#>          stageT3          stageT4          gradeII         gradeIII 
#>         0.630400         0.098060         0.640000        -0.026930 
#>         response            death       new_binary new_categoricalB 
#>        -0.127100        11.640000         0.091740         0.399100 
#> new_categoricalC new_categoricalD 
#>        -0.256000        -0.040160 
#> standard errors:
#>  [1] 0.34480 0.01264 0.20110 0.55500 0.46570 0.45500 0.40390 0.35240 0.37190
#> [10] 0.27270 0.29380 0.44400 0.48200 0.43840
#> two-sided p-values:
#>        trtDrug B              age           marker          stageT2 
#>             0.43             0.55             0.37             0.48 
#>          stageT3          stageT4          gradeII         gradeIII 
#>             0.18             0.83             0.11             0.94 
#>         response            death       new_binary new_categoricalB 
#>             0.73             0.00             0.75             0.37 
#> new_categoricalC new_categoricalD 
#>             0.60             0.93

Created on 2023-04-27 with reprex v2.0.2

Opening a more specific issue