futureverse/progressr

Slow performance with progressr::handler_cli

Opened this issue · 7 comments

I have noticed there is a considerable overhead when using handler_cli. For example,

f <- function(R = 1000) {
    pb <- progressr::progressor(steps = R)
    onerun <- function(...) {
        pb()
        Sys.sleep(0.001)
    }
    system.time(lapply(seq_len(R), onerun))
}

progressr::handlers(progressr::handler_progress)
f() |> print()
##   user  system elapsed                                                       
##  1.594   0.023   2.879 

compared to

progressr::handlers(progressr::handler_cli)
f() |> print()
##   user  system elapsed                                                              
##  9.058   0.021  10.352 

Is this something that could be fixed, or perhaps this need to be taken up with the cli maintainer?

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin22.6.0 (64-bit)
Running under: macOS Sonoma 14.3.1

Matrix products: default
BLAS:   /Users/kkzh/.asdf/installs/R/4.3.2/lib/R/lib/libRblas.dylib 
LAPACK: /Users/kkzh/.asdf/installs/R/4.3.2/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Copenhagen
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_4.3.2    R6_2.5.1          cli_3.6.2         hms_1.1.3        
 [5] prettyunits_1.2.0 crayon_1.5.2      vctrs_0.6.5       progress_1.2.3   
 [9] progressr_0.14.0  digest_0.6.34     lifecycle_1.0.4   pkgconfig_2.0.3  
[13] rlang_1.1.3      

Very nice example code, and clear post! I'm not a developer, but thought I'd add a data point in case it's helpful. I do not observe a difference between the two.

progressr::handlers(progressr::handler_progress)
f() |> print()
   user  system elapsed 
  0.384   0.036   1.588 
> progressr::handlers(progressr::handler_cli)
f() |> print()
   user  system elapsed 
  0.381   0.053   1.600 
> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] colorout_1.2-2

loaded via a namespace (and not attached):
[1] compiler_4.2.2   cli_3.6.1        parallel_4.2.2   progressr_0.14.0 git2r_0.32.0    
[6] digest_0.6.33    rlang_1.1.1     
> 

I also tried a different system, which has very different hardware but similar OS (one is Ubuntu 23.04, the other 23.10), and similarly do not see a difference:

> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] colorout_1.2-2

loaded via a namespace (and not attached):
[1] compiler_4.2.2   cli_3.6.1        parallel_4.2.2   progressr_0.14.0 git2r_0.32.0     digest_0.6.33    rlang_1.1.1     
> 

Do you have another system you can test your code on?

I can reproduce the slowness of the cli handler (R 4.3.3 on Linux running R from the terminal);

## Baseline
> progressr::handlers(progressr::handler_void())
> f()
   user  system elapsed 
  0.482   0.026   1.622 

## 'progress' package
> progressr::handlers(progressr::handler_progress)
> f()
   user  system elapsed                                                       
  0.879   0.024   1.994 

## 'cli' package
> progressr::handlers(progressr::handler_cli)
> f()
   user  system elapsed                                                        
  5.174   0.005   6.283

Running

profvis::profvis(f())

gives:

Screenshot from 2024-03-06 09-33-02

which reveals that lots of the overhead comes from time spent in cli:::clii__container_start() and cli:::clii__container_end().

So, yes, it looks like a cli issue. That said, before reporting to them, I would come up with an example that does not involve progressr. Comparing the performance of a vanilla cli progress bar to one from the progress package would probably be a good case to report on.

@kkholst I wonder if it is a regression from cli package from 3.6.1 to 3.6.2. If you make a minimal example without future package and want me to test it, I can test it on 3.6.1 and hopefully fail to reproduce the problem, and then I can upgrade to 3.6.2 and test it there to reproduce it.

Also, it looks like the performance gets worse in the RStudio Console (12s vs 6s). I'm testing with:

> packageVersion("cli")
[1] '3.6.2'

Here is what I get when comparing directly against progressand cli

progressr::handlers(global=TRUE)

f <- function(R = 500,
              type = c("progressr", "progress", "cli")) {
  cli::cli_alert_info(type)
  type <- gsub("\\.(.*)", "", type)
  pb <- switch(type[1],
        progress = progress::progress_bar$new(total = R),
        cli = cli::cli_progress_bar(total = R),
        progressr = progressr::progressor(steps = R),
        NULL
       )
    for (i in seq_len(R)) {
        switch(type[1],
            progress = pb$tick(),
            cli = cli::cli_progress_update(),
            progressr = pb(),
            NULL
        )
        Sys.sleep(0.005)
    }
}

f(type = "none") |>
    system.time() |>
    print()

progressr::handlers(progressr::handler_void)
f(type = "progressr.void") |>
    system.time() |>
    print()

progressr::handlers(progressr::handler_progress)
f(type = "progressr.progress") |>
    system.time() |>
    print()

progressr::handlers(progressr::handler_cli)
f(type = "progressr.cli") |>
    system.time() |>
    print()

f(type = "progress") |>
    system.time() |>
    print()

f(type = "cli") |>
    system.time() |>
    print()
ℹ none
   user  system elapsed
  0.020   0.005   3.105
ℹ progressr.void
   user  system elapsed
  0.499   0.012   3.600
ℹ progressr.progress
   user  system elapsed
  1.473   0.022   4.639
ℹ progressr.cli
   user  system elapsed
  5.046   0.012   8.240
ℹ progress
   user  system elapsed
  0.606   0.015   3.718
ℹ cli
   user  system elapsed
  0.450   0.005   3.541

Note, that progressr with the progress handler also seems to run 25% slower.

Thanks for this. Interesting, this suggests that the extra overhead comes from progressr interacting with cli.

Note that there will always be another layer of overhead when using progressr. All progress updates are signalled as condition in R, which handled by R's condition handlers. That layer in progressr is orchestrating these conditions, and translates them to commands specific to each progress handler package.

Looking at your stats, subtracting the "none" baseline, we have:

dt <- c(none=3.105, progressr.void=3.600, progressr.progress=4.639, progressr.cli=8.240, progress=3.718, cli=3.541)
dt2 <- dt - dt[["none"]]
print(dt2)
#>               none     progressr.void 
#>              0.000              0.495 
#> progressr.progress      progressr.cli 
#>              1.534              5.135 
#>           progress                cli 
#>              0.613              0.436 

Then, the added overhead per progress update (in milliseconds) is:

R <- 500
ms <- 1000*dt2/R
print(ms)
#>               none     progressr.void 
#>              0.000              0.990 
#> progressr.progress      progressr.cli 
#>              3.068             10.270 
#>           progress                cli 
#>              1.226              0.872

The handlers("cli") stands out by costing ~10ms per progress update, when the bare-bones baseline is ~1ms per progress update.

I'll add it to the list to see if there's some obvious that stands out and that can be improved.

As a workaround, see #81 for how to limit how frequently you send progress updates.

I'm sure you're already aware of this, but signalling lots of progress updates for short-running tasks is less of a value compared to long-running tasks. So, in practice, I expect 10ms vs 1ms per progress update makes little difference to the overall processing time in those cases. That said, if it can be improved, why not fix it. Thus far, I haven't profiled or optimized anything, so there's probably something that can be improved.

Another, much simpler, workaround is to use the intrusiveness argument that each progress handler has. Using:

progressr::handlers(progressr::handler_cli(intrusiveness = 10.0))

will cause the handler to render progress 10 times less frequently. So, instead of reporting on all R = 500 progress update, it only reports on every 10:th, i.e. in total 50 of them. That will bring your benchmark results from:

none
   user  system elapsed 
  0.043   0.006   2.606progressr.void
   user  system elapsed 
  0.325   0.017   2.910progressr.progress
   user  system elapsed                                         
  1.027   0.028   3.605progressr.cli
   user  system elapsed                                          
  2.743   0.018   5.309progress
   user  system elapsed                                         
  0.414   0.010   2.974cli
   user  system elapsed 
  0.264   0.011   2.829 

down to:

progressr.cli
   user  system elapsed                                          
  1.659   0.020   4.241