poptime bug
Closed this issue · 3 comments
Seems to be a weird bug where the cases distribute in a restricted range
library(survival)
library(casebase)
#> Warning: package 'casebase' was built under R version 4.0.2
#> See example usage at http://sahirbhatnagar.com/casebase/
library(simsurv)
#>
#> Attaching package: 'simsurv'
#> The following object is masked from 'package:casebase':
#>
#> brcancer
#plot(casebase::popTime(dat,time="time",event="status"))
samples<-400
cov <- data.frame(id=1:samples,source=rep("original",samples))
# Simulate the event times
dat <- simsurv::simsurv(lambdas = 2,
x = cov,
maxt = 1,
dist="exponential")
# Merge the simulated event times onto covariate data frame
dat <- data.frame(time=dat$eventtime,status=dat$status,source=cov$source)
plot(casebase::popTime(dat, event = "status",time="time"))
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz America/New_York
#> date 2020-07-10
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> casebase * 0.9.0 2020-07-03 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> data.table 1.12.8 2019-12-09 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr 1.0.0 2020-05-29 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> ggplot2 3.3.2 2020-06-19 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> labeling 0.3 2014-08-23 [1] CRAN (R 4.0.0)
#> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.0)
#> mgcv 1.8-31 2019-11-09 [2] CRAN (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
#> nlme 3.1-147 2020-04-13 [2] CRAN (R 4.0.0)
#> pillar 1.4.4 2020-05-05 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> simsurv * 0.2.3 2019-02-01 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> survival * 3.2-3 2020-06-13 [1] CRAN (R 4.0.0)
#> tibble 3.0.1 2020-04-20 [1] CRAN (R 4.0.0)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> vctrs 0.3.1 2020-06-05 [1] CRAN (R 4.0.0)
#> VGAM 1.1-3 2020-04-28 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.15 2020-06-21 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/Jesse/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.0/library
Created on 2020-07-10 by the reprex package (v0.3.0)
I simplified the example a bit, because I was wondering if it was due to "name collision" or something like that. But it's not:
library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
dat <- transform(data.frame("latent" = rexp(100)),
status = 1*(latent < 1),
eventtime = pmin(1, latent))
plot(casebase::popTime(dat, event = "status",
time = "eventtime"))
Created on 2020-07-10 by the reprex package (v0.3.0)
@Jesse-Islam brought this up before, not sure if he posted an issue, but he did provide a fix using the percentile_number
argument. It occurs because the function first tries to only sample from 'available' subjects, where n_available_at_i = DT[eventtime >= i & event != 1]
. Currently the default percentile_number = 0.5
implies that if the 50th percentile number of available subjects at any given point is less than 10, then sample regardless of case status. By decreasing this threshold, we can be less stringent. I should document this better and give an example. Or change the default.
library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
dat <- transform(data.frame("latent" = rexp(100)),
status = 1*(latent < 1),
eventtime = pmin(1, latent))
pt1 <- casebase::popTime(dat, event = "status",
time = "eventtime",
percentile_number = 0.5) # default is 0.5
table(pt1$n_available)
#>
#> 0 42
#> 42 58
pt2 <- casebase::popTime(dat, event = "status",
time = "eventtime",
percentile_number = 0.1)
table(pt2$n_available)
#>
#> 0 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
#> 42 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
plot(pt2)
Created on 2020-07-10 by the reprex package (v0.3.0)
Oh I implemented that parameter!
Completely forgot about it though, my bad. I'll close the issue.