MSKCC-Epi-Bio/tidycmprsk

`cuminc()` very slow on larger data set

Closed this issue · 5 comments

Using a larger data set (~500k rows), this function runs very slow. I suspect some iteration or mapping to be the bottleneck.

I also get a warning (anonymized):

cuminc(Surv(time, competing_risk_outcome_fct) ~ 1, df)

[warn number]: Problem while computing `outcome = ifelse(...)`.
i Unreplaced values treated as NA as `.x` is not compatible.
i The warning occurred in group [row number]: time = [time].

Thanks @m-freitag !

I will be honest and just say that speeding up the package is not a high priority for me, and I probably won't look into this. (Sorry, just too many competing tasks at the moment!)

For shear speed, you may want to look into this package
https://cran.r-project.org/web/packages/fastcmprsk/index.html

@tengfei-emory happy to hear your thoughts too!

Thanks @m-freitag!

I am not sure if the large scale of data is the real issue here. I would suggest trying surv.object <- Surv(time, competing_risk_outcome_fct) and see if the Surv object has similar warnings. Maybe some of the 500k data points can be further cleaned?

Thanks in forward and thanks Daniel and Teng for this great package.

I had a similar problem with the tidycompsk package when dealing with large amounts of data (over 300,000 data).

Compared to the cmprsk and survival packages, the cuminc and crr functions in tidycmprsk run too slowly, taking several hours to wait for a result. I really like your package and wonder if you have any suggestions regarding this issue

fit <- cmprsk::cuminc(dat_allpop_ana_1$ftime_inc, dat_allpop_ana_1$status_inc, dat_allpop_ana_1$Result_Risk2, cencode=0)
fit1 <- cuminc(Surv(ftime_inc, status_inc) ~ Result_Risk2, dat_allpop_ana_1)
fit2 <- coxph(Surv(ftime_inc, status_inc)) ~ Result_Risk2, dat_allpop_ana_1)

Thanks @m-freitag !

I will be honest and just say that speeding up the package is not a high priority for me, and I probably won't look into this. (Sorry, just too many competing tasks at the moment!)

For shear speed, you may want to look into this package https://cran.r-project.org/web/packages/fastcmprsk/index.html

@tengfei-emory happy to hear your thoughts too!

Package fastcmprsk was removed from the CRAN repository.

If approved my changes should make a difference of about 1 order of magnitude. And I guess there is still room for further improvements. This was only a fast shot on solving the issue. I was stumbling about the performance issues on a dataset with 600k rows today.