HenrikBengtsson/future.batchtools

huge results file with 'conditions' - performance bottleneck

kkmann opened this issue · 4 comments

Hi,

I used future.batchtools with slurm backend for a furrr operation (thanks btw, very convenient). The actual loop runs as expected but there is a huge overhead from reading in the results .rds file from the .future subfolder. The issue seem to be the 'condition' fields in the results object. I checked them manually and they seem to be harmless but take up a lot of disk space. How can I prevent the recorded conditions from bloating up future.batchtools results (i.e., in my example the performance bottleneck seem to be the I/O of reading the results back).

Thanks!

is that related to #65 ?

futures collect all conditions by default in order to relay them to your main R session. This is how messages and warnings can be "appear" when we run in parallel. This default is controlled by the default of argument conditions in future(... conditions = "condition"). Since messages and warnings are both of class condition, this default will capture both types. To only capture and relay, say, warnings, use future(... conditions = "warning"). To achieve the same with furrr, you can do something like:

y <- future_map(..., .options = furrr_options(conditions = "warning"))

The issue seem to be the 'condition' fields in the results object.

What's useful to know is what type of conditions are captured. Do you get a lot of warning():s or a lot of message():s, or something else? An excessive amount of message:s would suggest that there's some verbose/debug output that you might want to turn off. If you can't easily tell from the output, you could do something like:

withCallingHandlers({
  y <- future_map(...)  ## with the default 'conditions'
}, condition = function(cond) message("Captured condition: ", class(cond)[1]))

I see. Fortunately no warnings, otherwise I'd have picked it up earlier. Thanks!

FWIW, in the next release of the future package (HenrikBengtsson/future@8102359), it'll be possible to not only specify which condition classes to capture and relay but also which to be ignored, e.g.

library(future.batchtools)
plan(batchtools_local)

f <- future({ 
  message("foo")
  warning("bar")
  42
}, conditions = structure("condition", exclude = "message"))

v <- value(f)
## Warning message:
## In eval(quote({ : bar

Note how the message condition is ignored; formally it's muffled on the worker so in this case it's not saved to the batchtools registry.