DavisVaughan/furrr

UPCOMING: future(... stdout = structure(TRUE, drop = TRUE))

Closed this issue · 2 comments

FYI, in the next release of future there will be:

NEW FEATURES:

  • Now f <- future(..., stdout = structure(TRUE, drop = TRUE)) will
    cause the captured standard output to be dropped from the future
    object as soon as it has been relayed once, for instance, by
    value(f). Similarly, conditions = structure("conditions", drop =
    TRUE) will drop captured non-error conditions as soon as they have
    been relayed. This can help decrease the amount of memory used,
    especially if there are many active futures.

You can already now prepare for these by making them your new defaults in:

furrr/R/furrr-options.R

Lines 125 to 127 in bfb1ce3

furrr_options <- function(...,
stdout = TRUE,
conditions = NULL,

I've already prepared future.apply and doFuture to do this, so when the next version of future is released, they'll start dropping captured stdout and conditions asap (=as soon as they've been relayed).

Is the idea in your future.apply commit below that since future_lapply() and friends only return the value() rather than the actual future object, there is never any need to retain the stdout/condition objects, so they should always be dropped?

https://github.com/HenrikBengtsson/future.apply/blob/66a952bd7872e48d9d5065aecd216935264422e3/R/future_xapply.R#L103-L108

Is the idea in your future.apply commit below that since future_lapply() and friends only return the value() rather than the actual future object, there is never any need to retain the stdout/condition objects, so they should always be dropped?

Correct.

Consider the following toy example:

y <- future.apply::future_lapply(1:1e6, FUN = function(x) {
  for (kk in 1:1e3) message(kk)
  42L
}, future.chunk.size = 1L)

Without this feature, we would hold onto all those captured 10^6*10^3 message conditions until the very end when the 10^6 future objects are removed. With the new feature, the message conditions are dropped as soon as they're relayed. In the best case scenario, everything is processed in order and things are relayed asap. In the worst case scenario, the first element (x=1) completes last, in case nothing can be dropped until that is done (because all stdout and conditions are relayed in order of chunks processed). Clear as mud?