epiforecasts/EpiNow

Memory leak

seabbs opened this issue · 3 comments

When running over many regions there is a memory leak that causes an out of RAM error and ultimately leading to estimation failing.

There is a gist here that provides a reprex.

Via turning areas in the code off I believe I have localized the issue to here:

EpiNow/R/estimate_R0.R

Lines 190 to 239 in 0d69358

## Forecast Rts using the mean estimate
rt_forecasts <-
data.table::setDT(
safe_forecast(rts = est_r[sample == 1, .(date, rt = mean_R)],
model = forecast_model,
horizon = horizon,
samples = rt_samples)[[1]]
)
##Forecast the variance using the same model structure
sd_forecasts <-
data.table::setDT(
safe_forecast(rts = est_r[sample == 1, .(date, rt = sd_R)],
model = forecast_model,
horizon = horizon,
samples = rt_samples)[[1]]
)[, sd_rt := rt][,rt := NULL]
## Join mean and sd forecasts
rt_forecasts <- rt_forecasts[sd_forecasts, on = c("date", "sample", "horizon")]
## Add gamma noise and drop mean and sd forecasts
## Assume that the minumum allowed gamma noise is 1e-4
## Zero sd will result in rgamma draws that are also 0 and so must be avoided
rt_forecasts <- rt_forecasts[sd_rt <= 0, sd_rt := 1e-4][,
rt := purrr::map2_dbl(rt, sd_rt, ~ mean_rgamma(1, mean = .x, sd = .y))][,
.(sample, date, horizon, rt)]
## Forecast cases
case_forecasts <-
data.table::setDT(
EpiSoon::forecast_cases(
cases = summed_cases,
fit_samples = rt_forecasts,
rdist = rpois,
serial_interval = generation_times[, index]
)
)[,rt_type := "forecast"]
## Join Rt estimates and forecasts
est_r <- est_r[, `:=`(R = sample_R, rt_type = "nowcast")][,
`:=`(mean_R = NULL, sd_R = NULL, sample_R = NULL)][,
.(date, R, sample, crps, window, rt_type)]
est_r <- data.table::rbindlist(
list(est_r,
rt_forecasts[,.(date, R = rt, sample, rt_type = "forecast")]
), fill = TRUE)
return(list(rts = est_r, cases = case_forecasts))

This area of the code runs a forecast and relies on EpiSoon, data.table, purrr, fable and fabletools any of which may be the source of the problem.

This deduction may also be a false positive with another area of the code to blame. All code here that is not from other packages has recently been rewritten from the tidyverse to data.table.

Hi,
I didn't have any luck with profvis, but using lineprof:

library(EpiNow)
library(lineprof)
prof <- lineprof(EpiNow::estimate_R0(cases = EpiSoon::example_obs_cases,
                          generation_times = as.matrix(EpiNow::covid_generation_times[,1]),
                           rt_prior = list(mean_prior = 2.6, std_prior = 2),
                          windows = c(1, 3, 7), rt_samples = 10, gt_samples = 2,
                          min_est_date =  as.Date("2020-02-18"),
                          forecast_model = function(...){EpiSoon::fable_model(model = fable::ETS(y ~ trend("A")), ...)},
                         horizon = 7), torture = TRUE

The only thing that came up was EpiEstim::estimate_R leaking about 1Mb (and not obviously in a single line, at least as far as I can tell). I guess that's not enough to explain your issue.

Attached rds if you want to take a look.
lineprof.rds.zip

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] shiny_1.4.0       lineprof_0.1.9001 EpiNow_0.2.0

loaded via a namespace (and not attached):
  [1] mcmc_0.9-7              matrixStats_0.56.0      xts_0.12-0
  [4] lubridate_1.7.8         bsts_0.9.5              rstan_2.19.3
  [7] tools_3.6.2             R6_2.4.1                coarseDataTools_0.6-5
 [10] colorspace_1.4-1        npsurv_0.4-0.1          tidyselect_1.0.0
 [13] gridExtra_2.3           prettyunits_1.1.1       EpiEstim_2.2-1
 [16] processx_3.4.2          compiler_3.6.2          cli_2.0.2
 [19] quantreg_5.55           HDInterval_0.2.0        SparseM_1.78
 [22] scales_1.1.0            callr_3.4.3             goftest_1.2-2
 [25] stringr_1.4.0           digest_0.6.25           StanHeaders_2.21.0-1
 [28] tsibble_0.8.6           R.utils_2.9.2           MCMCpack_1.4-7
 [31] base64enc_0.1-3         Boom_0.9.6              pkgconfig_2.0.3
 [34] htmltools_0.4.0         fastmap_1.0.1           scoringRules_1.0.0
 [37] rlang_0.4.6             generics_0.0.2          jsonlite_1.6.1
 [40] zoo_1.8-8               dplyr_0.8.5             R.oo_1.23.0
 [43] inline_0.3.15           magrittr_1.5            EpiSoon_0.3.0
 [46] R.devices_2.16.1        loo_2.2.0               patchwork_1.0.0
 [49] Matrix_1.2-18           Rcpp_1.0.4.6            munsell_0.5.0
 [52] fansi_0.4.1             lifecycle_0.2.0         R.methodsS3_1.8.0
 [55] furrr_0.1.0             stringi_1.4.6           MASS_7.3-51.4
 [58] pkgbuild_1.0.7          plyr_1.8.6              grid_3.6.2
 [61] parallel_3.6.2          BoomSpikeSlab_1.2.3     listenv_0.8.0
 [64] promises_1.1.0          crayon_1.3.4            lattice_0.20-38
 [67] cowplot_1.0.0           splines_3.6.2           knitr_1.28
 [70] anytime_0.3.7           ps_1.3.2                pillar_1.4.4
 [73] future.apply_1.4.0      reshape2_1.4.3          codetools_0.2-16
 [76] stats4_3.6.2            glue_1.4.0              lsei_1.2-0.1
 [79] fabletools_0.1.3        data.table_1.12.8       vctrs_0.2.4
 [82] httpuv_1.5.2            MatrixModels_0.4-1      gtable_0.3.0
 [85] purrr_0.3.4             tidyr_1.0.2             future_1.17.0
 [88] incidence_1.7.1         assertthat_0.2.1        ggplot2_3.3.0
 [91] xfun_0.12               mime_0.9                xtable_1.8-4
 [94] coda_0.19-3             later_1.0.0             scoringutils_0.0.0.9000
 [97] survival_3.1-8          tibble_3.0.1            globals_0.12.5
[100] fitdistrplus_1.0-14     ellipsis_0.3.0

Hi John,

Thanks for taking a look (and for the offer of help - see email).

I haven't tried lineprof (only profvis) so I will take a look. Even a small memory leak here could be at fault if it scales with samples (1000 per country so ~ 1Gb per country at least).

Sam

Further investigation indicates that the root cause of the memory leak is the fable/fabletools package (potentially linked to this issue: tidyverts/fable#230).

Dropping the fabletools model defined in the reprex and replacing it with the following:

function(...) {EpiSoon::bsts_model(model = 
                      function(ss, y){bsts::AddSemilocalLinearTrend(ss, y = y)}, ...)}

Appears to fix the RAM issue. This is not a drop-in replacement however as in testing the bsts models have generally performed worse than those from fable. Possible options that are also likely to perform well are the forecastHybrid package, the forecast package and the brms package. None of these are currently supported by EpiSoon with an appropriate wrapper which will first need to be done before this can tested further.