rstudio/reticulate

import inside py_capture_output breaks delay_load

gdurif opened this issue · 6 comments

I am using reticulate::import() with delay_load=TRUE to load a Python package in the .onLoad() function of my R package (so that user can setup their Python session, e.g. using a virtual environment, afterward).

Said Python package is printing to stdout when being imported, which I want to avoid because it is good practice that R package verbosity on loading could be disabled by the user (for quiet loading).

So I tried to wrap my reticulate::import() call inside a reticulate::py_capture_output() call, however the delay loading is not working anymore. Below is a minimum working example:

  • SciPy is not available, direct import fails as expected:
library(reticulate)
scipy <- import("scipy")
## Error in py_module_import(module, convert = convert) : 
##  ModuleNotFoundError: No module named 'scipy'
## Run `reticulate::py_last_error()` for details.
  • SciPy is not available yet but import with delay loading works as expected (e.g. I may call reticulate::use_virtualenv() to setup a Python virtual environment where SciPy is installed afterward):
library(reticulate)
scipy <- import("scipy", delay_load=TRUE)
  • Import with delay loading inside a py_capture_output() call does not work:
library(reticulate)
msg <- py_capture_output({scipy <- import("scipy", delay_load=TRUE)})
## Error in py_module_import(module, convert = convert) : 
##  ModuleNotFoundError: No module named 'scipy'
## Run `reticulate::py_last_error()` for details.

Note: this behavior is maybe expected since reticulate may need to initiate the Python session to call py_capture_output(). In that case, how could I avoid verbosity printing to stdout when importing a Python package during my R package loading ?

Thanks

Edit1: this is a minimum working example, SciPy import produces no output that I would need to capture. My R package is importing another Python package that produces output at import, namely pykeops.

Edit2: this is my R session info:

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0 
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8    
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8   
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_1.35.0

loaded via a namespace (and not attached):
[1] compiler_4.3.2 Matrix_1.6-1.1 cli_3.6.2      Rcpp_1.0.12    grid_4.3.2    
[6] jsonlite_1.8.8 rlang_1.1.3    png_0.1-8      lattice_0.21-9

What is the output you are trying to suppress? (import("scipy") produces no output for me).

@t-kalinowski This is a minimum working example, I am not importing SciPy in my package but another Python package. Sorry it was not clear, I edited my issue to specify it.

We don't currently have an official API for this. Thinking though how this could be implemented, an R package could probably do something like this (completely untested code):

.onLoad <- function(...) {
  if (reticulate::py_available())
    reticulate::py_capture_output({
      module <<- reticulate::import("module")
    })
  else {
    py_output_context_manager <- NULL
    module <<- import("module", delay_load = list(
      before_load = function() {
        reticulate::py_available(TRUE) # initialize python
        
        # get output tools helper functions
        output_tools <- import("rpytools.output")
        py_output_context_manager <<-
          output_tools$OutputCaptureContext(capture_stdout = TRUE, 
                                            capture_stderr = TRUE)
        
        py_output_context_manager$`__enter__`()
      }
      on_load = function() {
        py_output_context_manager$`__exit__`()
        py_output_context_manager <<- NULL
      },
      on_error = function(error) {
        py_output_context_manager$`__exit__`()
        py_colected_output <- py_output_context_manager$collect_output()
        py_output_context_manager <<- NULL
        
        error$message <- paste0(c(error$message, py_colected_output), sep = "\n")
        stop(error)
      }
    ))
  }
}

However, taking a look at 'pykeops' specifically, seems like this is all that's needed to suppress output:

Sys.setenv("PYKEOPS_VERBOSE" = "0")

Thanks a lot @t-kalinowski! I'll work on that.

Regarding the environment variable setup, I want to catch the output from Python and print it with packageStartupMessage() in R (so that it can be disabled or not using library(xxx, quietly = TRUE/FALSE)), and not just totally disable it.

Worth mentioning if you're going to be setting env variables: if Python is initialized already, changes to environment variables made with Sys.setenv() don't automatically propagate to the Python session. (#1339)

So you'll want to do something like this:

Sys.setenv("PYKEOPS_VERBOSE" = "0")
if(py_available()) 
  import("os")$environ$update(list("PYKEOPS_VERBOSE" = "0"))

I might be mistaken, but I don't think library(xxx, quietly = TRUE/FALSE) suppresses output from packageStartupMessage(), users would need suppressPackageStartupMessages() for that.


With import(delay_load = TRUE), Python will not initialize during your packages .onLoad() call, meaning, you won't be able to capture output from python initializing and pass it packageStartupMessage() before .onLoad() had returned. I believe conventionally packageStartupMessage() is only called from .onLoad(). I don't know if that's just convention or if CRAN enforces it, but either way, I'd recommend a different approach.


I just pushed a small updated to OutputCaptureContext(), making it a little more ergonomic to use. (#1569)

@t-kalinowski thank you very much again.

You are right regarding suppressPackageStartupMessages() and I did not think about delay import that will not work with packageStartupMessage(). I will set the environment variable as you suggested.