HannaMeyer/CAST

model filesize because of perf_all in the ffs

Ludwigm6 opened this issue · 1 comments

The filesize (especially the RAM) get quite big with many predictors.
This is because of the creation of a large data.frame for the perf_all containing rows and columns corresponding to the number of predictors and model runs.
The line 224 gets rid of the empty lines at the bottom of the df, however there are still empty columns left after the ffs stops. E.g. with 116 predictors, 8 got selected by the ffs. Perf all still have all 119 columns for every predictor:

length(colnames(perf_all_big$perf_all))
[1] 119

To get rid of the columns you could use e.g.

bestmodel$perf_all <- bestmodel$perf_all[,colSums(is.na(bestmodel$perf_all)) != nrow(bestmodel$perf_all)]

Again the example with reduced size:

cutting <- big_perf_all[, colSums(is.na(big_perf_all)) != nrow(big_perf_all)]
> object.size(big_perf_all)
3604768 bytes (3.4 mb)
> object.size(cutting)
480784 bytes (0.4 mb)

Greetings Marvin

Thanks Marvin. I changed it!