sahirbhatnagar/casebase

Is the offset inverted?

Closed this issue · 2 comments

on ERSPC:
#> [1] "ratio of 1:100: 3.25138419124016"
#> [1] "ratio of 1:10: 5.55396928423421"

based on the equation from hanley's paper, I expect ln(b/B), or before calculations b/B to be the offset. Therefore a larger ratio, would suggest a larger b, which makes the offsets in this small example not follow my expectations. Unless we invert it later on?

library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
set.seed(1)
data("ERSPC")
mod_cb_glm <- fitSmoothHazard(DeadOfPrCa ~ Follow.Up.Time + ScrArm,
                              data = ERSPC,
                              time = "Follow.Up.Time", ratio = 100)
print(paste("ratio of 1:100:",mod_cb_glm$offset[1]))
#> [1] "ratio of 1:100: 3.25138419124016"
mod_cb_glm <- fitSmoothHazard(DeadOfPrCa ~ Follow.Up.Time + ScrArm,
                              data = ERSPC,
                              time = "Follow.Up.Time", ratio = 10)
print(paste("ratio of 1:10:",mod_cb_glm$offset[1]))
#> [1] "ratio of 1:10: 5.55396928423421"








sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] casebase_0.9.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.30        magrittr_1.5      splines_4.0.3     tidyselect_1.1.0 
#>  [5] munsell_0.5.0     lattice_0.20-41   colorspace_1.4-1  R6_2.4.1         
#>  [9] rlang_0.4.8       dplyr_1.0.2       stringr_1.4.0     highr_0.8        
#> [13] tools_4.0.3       grid_4.0.3        nlme_3.1-149      data.table_1.13.2
#> [17] gtable_0.3.0      mgcv_1.8-33       xfun_0.18         htmltools_0.5.0  
#> [21] ellipsis_0.3.1    survival_3.2-7    yaml_2.2.1        digest_0.6.26    
#> [25] tibble_3.0.4      lifecycle_0.2.0   crayon_1.3.4      Matrix_1.2-18    
#> [29] purrr_0.3.4       ggplot2_3.3.2     vctrs_0.3.4       VGAM_1.1-4       
#> [33] glue_1.4.2        evaluate_0.14     rmarkdown_2.5     stringi_1.5.3    
#> [37] compiler_4.0.3    pillar_1.4.6      generics_0.0.2    scales_1.1.1     
#> [41] stats4_4.0.3      pkgconfig_2.0.3

Created on 2020-11-07 by the reprex package (v0.3.0)

Looking into sampleCaseBase, the offset is indeed log(B/b). Should it be log(b/B)?

I'm not sure which equation you're referring to in Hanley & Miettinen, but they explicitly write the offset should be log(B/b):

image

Which makes sense: you want the offset to go to zero as you increase the size of the base series, because you're getting closer and closer to using all information.