SugiharaLab/rEDM

CCM error "model$set_block(data.matrix(block))"

Closed this issue · 4 comments

Same data set causing the error with smap is causing the following error in the CCM method: "model$set_block(data.matrix(block))"

Again, I'm happy to share the dataset (96407 x 4 features + timestamps matrix)

Here is an R crash dump from one of the CCM : std::bad_alloc crashes
https://drive.google.com/open?id=1HFv0mzxUHvQZjGZE0cyuiSMUFB1Gonf6

For anyone else having issues with :std::bad_alloc crashes, I found a work around that may prevent crashes if the call to rEDM's CCM function is within a loop. For example, the tutorial (https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.html) has an example for calling CCM in a loop in order to compare Cross Mapping Skill (rho) for different Time to Predict (tp) values.

If you have a large data set (>20k rows), first test if a single CCM run will complete without crashing. If it does, then repeated calls to CCM in a loop ("Time Delays with CCM" example) may cause a crash. To get around this, manually call garbage collection gc() either every loop or every couple iterations. Here's an example where I call gc() every 10 iterations to prevent a crash.

vars_city <- names(speed_brake_rpm)[2:4]
tp_list = seq(-500, 500, 100)
params_city <- expand.grid(lib_column = vars_city, target_column = vars_city, tp = tp_list)
params_city <- params_city[params_city$lib_column != params_city$target_column, ]
params_city$E <- 9

output <- list()
step_size = 10

for (n in 0:floor(NROW(params_city)/step_size)) {
gc()
for (i in (n*step_size+1):(min((n+1)*step_size, NROW(params_city)))) {
output[[i]] = ccm(speed_brake_rpm, E = 9, lib_sizes = NROW(speed_brake_rpm), lib_column = params_city$lib_column[i], target_column = params_city$target_column[i], tp = params_city$tp[i], random_libs = FALSE, num_neighbors = "E+1", silent = TRUE)
}
}

library(data.table)
output_df = rbindlist(output)
output_df$direction <- paste(output_df$lib_column, "xmap to\n", output_df$target_column)
save(output_df, file = "ccm_optimal_tp.Rda")

ha0ye commented

Thanks, did you figure out the source of #34 with why there was a memory error for S-map but not simplex?

Dr. Ye,
Unfortunately I believe this is simply a memory management related bug somewhere in the libraries rEDM depends upon. When running rEDM functions with Windows on larger data sets, memory utilization is >99% until either a crash or a manual call to gc() briefly postpones the crash.

Contrary to the above hypothesis, when running Smap and CCM on a subset of the larger data sets, the methods still crashes almost immediately. I triple checked that the data was 'cleaned' and simplex finished so there may be something unique to those methods that isn't part of simplex projection.

If I have some free time I'll try to jumping between the crash log and the code to see what may be different.