Include simple .progress switch in future_lapply
Opened this issue · 2 comments
Both purrr and furrr's map*
functions have a .progress
switch to enable a progress bar. It would be very convenient to also have it available, e.g. in future_lapply.
Here is a quick implementation of a .progress
switch which uses a CLI-based progress bar, that is also in use in purrr and that also seems to be favoured by furrr:
library("future")
library("future.apply")
library("progressr")
f <- function(x){
Sys.sleep(.01)
x
}
future_lapply_progress <- function(X, FUN, ..., .progress = FALSE){
handlers("cli")
if(.progress == TRUE){
with_progress({
nX <- length(X)
p <- progressor(nX)
FUN_progress <- function(X, ...) {
res <- FUN(X, ...)
p()
res
}
future_lapply(X, FUN_progress, ...)
})
} else {
future_lapply(X, FUN, ...)
}
}
res <- future_lapply_progress(1:100, f, .progress = F)
res <- future_lapply_progress(1:100, f, .progress = T)
I know about futureverses "back-end/ront-end" design philisophy to separate progress signaling from displaying progress. However, if one quickly wants to parallelize some lapply-type code in RStudio, having a simple .progress switch is more convenient than having to set up the progressor, including it in the function to be parallelized, wrapping it in a with_progress()
and remembering/researching how these three pieces fit together. Also, this implementation would possibly serve as a template to update the furrr's .progress
implementation to using the CLI package.
To start out, you're not the first to ask for this, and you won't be the last. The problem is not the implementation - the problem is the API and the contract to the developer and end-user.
I know about futureverses "back-end/ront-end" design philisophy to separate progress signaling from displaying progress.
This is key. I 100% understand that a .progress
argument would be convenient etc. Also, I don't want to lock the design into a specific solution for progress reporting. It could be that someone else comes up with a better framework tomorrow.
Another argument is the design contract of future.apply, which says it should work just like base-R apply functions - no more, no less. As soon we start adding additional features (other wishes are out there), it is no longer a one-to-one mapping, and all of a sudden we made it slightly more complicated to go from lapply()
to future_lapply()
and back.
One could also argue that if progress reporting should be built in in future.apply, then it should also be built in into base-R apply functions. Along these lines, I think furrr implemented its own progress reporting before purrr had one. I could imagine that furrr will try to re-align its API with that of purrr at some point. OTH, this is not an easy move to make because less than a year ago, the roadmap was "furrr currently has its own progress bar through the usage of .progress = TRUE, but in the future this will be deprecated in favor of generic and robust progress updates through the progressr package." [1]. Davis touches on this problem at the end of [1] saying "progressr represents an exciting move towards a unified framework for progress notifications in R, but it is still early in its development cycle and needs more usage and feedback to settle on the best API. In the future, the plan is for furrr to become more tightly integrated with progressr so that this is much easier."
[1] https://furrr.futureverse.org/articles/progress.html
A better alternative is to provide a mechanism for "injecting" a progress reporter to any map-reduce function, where future.apply is completely unaware of the solution. With some syntactic sugar, it might end up looking like:
y <- future_lapply(X, f) %with_progress% TRUE
y <- lapply(X, f) %with_progress% TRUE
y <- future_map(X, f) %with_progress% TRUE
y <- map(X, f) %with_progress% TRUE
or
y <- future_lapply(X, P(f))
y <- lapply(X, P(f))
y <- future_map(X, P(f))
y <- map(X, P(f))
or
y <- future_lapply(P(X), f)
y <- lapply(P(X), f)
y <- future_map(P(X), f)
y <- map(P(X), f)
I've played around with variants of this over the years, but yet have to find a satisfying solution that is not "hackish". Also, this is something that can be solved by the R community and not just me. Gabor has done some work along these lines, cf. https://cli.r-lib.org/articles/progress.html#progress-bars-for-mapping-functions-cli_progress_along.
All that said, there is nothing preventing someone else from building an API on top of the existing future-based map-reduce solutions, just like you did. We have seen this happening before outside Futureverse, e.g. pbmcapply and pbapply.
So, sorry for being that conservative.
... wrapping it in a
with_progress()
and remembering/researching how these three pieces fit together. Also, this implementation would possibly serve as a template to update the furrr's .progress implementation to using the CLI package.
As you know, I say the user should be in full control of the progress reporting. So, they should use with_progress()
, or better, handlers(global = TRUE)
. They should also be in charge of which kind. Remember, it's only recently that the cli package replaced the progress package as the de-facto standard in Tidyverse. Who knows what will be the standard in a few years. By giving the end user the control, you'll increase the chances for a uniform experience across packages and over time.
Progress reporting in R is still in its infancy. There are so many more things that need to be figured out. Nested progress reporting is one, especially since more and more functions start reporting on progress. What should happen when such functions call each other? Maybe there will grow out standard for standardized, generic hook functions that progress frameworks can hook into? I don't know the answer to this, but I'm trying to keep as many doors as possible open for when that day comes.
FWIW, I'm also learning about best design pattern the more I work with progress reporting myself. For example, instead of:
FUN_progress <- function(X, ...) {
res <- FUN(X, ...)
p()
res
}
I tend to like:
FUN_progress <- function(X, ...) {
on.exit(p())
FUN(X, ...)
}
a bit more.
I also haven't decided if progress should be reported when an iteration is starting or finishing (as above), or both. Maybe different scenarios require different progress reporting.
Thank you for explaining you thought process on this topic. It seems to be a more complicated issue than I initially thought. Maybe even setting an environment variable to subscribe to signalling progress would be possible, e.g. R_PROGRESS_INDICATOR = "cli"
or R_PROGRESS_INDICATOR = FALSE
. This way, one can run the same code on different systems (interactive RStudio session vs HPC) a progress is displayed in a sensible way.