sahirbhatnagar/casebase

poptime bug

Closed this issue · 3 comments

Seems to be a weird bug where the cases distribute in a restricted range

 
  library(survival)
library(casebase)
#> Warning: package 'casebase' was built under R version 4.0.2
#> See example usage at http://sahirbhatnagar.com/casebase/
library(simsurv)
#> 
#> Attaching package: 'simsurv'
#> The following object is masked from 'package:casebase':
#> 
#>     brcancer
  #plot(casebase::popTime(dat,time="time",event="status"))
  
  samples<-400
  cov <- data.frame(id=1:samples,source=rep("original",samples))
  # Simulate the event times
  dat <- simsurv::simsurv(lambdas = 2, 
                          x = cov, 
                          maxt = 1,
                          dist="exponential")
  
  # Merge the simulated event times onto covariate data frame
  dat <- data.frame(time=dat$eventtime,status=dat$status,source=cov$source)
    plot(casebase::popTime(dat, event = "status",time="time"))

  sessioninfo::session_info()  
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2020-07-10                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  casebase    * 0.9.0   2020-07-03 [1] CRAN (R 4.0.2)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 4.0.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  data.table    1.12.8  2019-12-09 [1] CRAN (R 4.0.0)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  dplyr         1.0.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  farver        2.0.3   2020-01-16 [1] CRAN (R 4.0.0)
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
#>  ggplot2       3.3.2   2020-06-19 [1] CRAN (R 4.0.0)
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 4.0.0)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.0)
#>  knitr         1.28    2020-02-06 [1] CRAN (R 4.0.0)
#>  labeling      0.3     2014-08-23 [1] CRAN (R 4.0.0)
#>  lattice       0.20-41 2020-04-02 [2] CRAN (R 4.0.0)
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  Matrix        1.2-18  2019-11-27 [2] CRAN (R 4.0.0)
#>  mgcv          1.8-31  2019-11-09 [2] CRAN (R 4.0.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.0)
#>  nlme          3.1-147 2020-04-13 [2] CRAN (R 4.0.0)
#>  pillar        1.4.4   2020-05-05 [1] CRAN (R 4.0.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  rlang         0.4.6   2020-05-02 [1] CRAN (R 4.0.0)
#>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.0)
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  simsurv     * 0.2.3   2019-02-01 [1] CRAN (R 4.0.0)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  survival    * 3.2-3   2020-06-13 [1] CRAN (R 4.0.0)
#>  tibble        3.0.1   2020-04-20 [1] CRAN (R 4.0.0)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
#>  vctrs         0.3.1   2020-06-05 [1] CRAN (R 4.0.0)
#>  VGAM          1.1-3   2020-04-28 [1] CRAN (R 4.0.0)
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun          0.15    2020-06-21 [1] CRAN (R 4.0.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] C:/Users/Jesse/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.0/library

Created on 2020-07-10 by the reprex package (v0.3.0)

I simplified the example a bit, because I was wondering if it was due to "name collision" or something like that. But it's not:

library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/

dat <- transform(data.frame("latent" = rexp(100)), 
                 status = 1*(latent < 1),
                 eventtime = pmin(1, latent))

plot(casebase::popTime(dat, event = "status",
                       time = "eventtime"))

Created on 2020-07-10 by the reprex package (v0.3.0)

@Jesse-Islam brought this up before, not sure if he posted an issue, but he did provide a fix using the percentile_number argument. It occurs because the function first tries to only sample from 'available' subjects, where n_available_at_i = DT[eventtime >= i & event != 1]. Currently the default percentile_number = 0.5 implies that if the 50th percentile number of available subjects at any given point is less than 10, then sample regardless of case status. By decreasing this threshold, we can be less stringent. I should document this better and give an example. Or change the default.

library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
dat <- transform(data.frame("latent" = rexp(100)), 
                 status = 1*(latent < 1),
                 eventtime = pmin(1, latent))

pt1 <- casebase::popTime(dat, event = "status",
                        time = "eventtime",
                        percentile_number = 0.5) # default is 0.5
table(pt1$n_available)
#> 
#>  0 42 
#> 42 58

pt2 <- casebase::popTime(dat, event = "status",
                         time = "eventtime",
                         percentile_number = 0.1) 
table(pt2$n_available)
#> 
#>   0  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61 
#>  42   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
#>  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81 
#>   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
#>  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
#>   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
plot(pt2)

Created on 2020-07-10 by the reprex package (v0.3.0)

Oh I implemented that parameter!
Completely forgot about it though, my bad. I'll close the issue.