targets' use of warnings is redundant, prone to false positives and negatives, and risks (causing other packages to) violating CRAN policy in normal use

Question

targets' use of warnings is redundant, prone to false positives and negatives, and risks (causing other packages to) violating CRAN policy in normal use

cjvanlissa opened this issue a year ago · 2 comments

I'm working on integrating targets with my worcs package, and notice the following gratuitous warning when _targets.R contains the code:

tar_target(
    name = df,
    command = worcs::load_data(to_envir = FALSE)$embeddings
  )

Output:

Warning message:
Do not use load_data() from {devtools} or {pkgload} to load packages or custom functions/globals for `targets`. If you do, custom functions will go to a package environment where `targets` may not track them, and the loaded data will not be available in parallel workers created by tar_make_clustermq() or tar_make_future(). Read https://books.ropensci.org/targets/packages.html#loading-and-configuring-r-packages and https://books.ropensci.org/targets/packages.html#package-based-invalidation for the correct way to load packages for `targets` pipelines. Warnings like this one are important, but if you must suppress them,  you can do so with Sys.setenv(TAR_WARN = "false").

I believe that this warning is poorly implemented; going purely by function name when throwing this warning is prone to false positives, as you see here when a user calls a different function by the same name. Moreover, it is also prone to false negatives if the user calls the offending functions outside of _targets.R. Moreover, this puts me in a position where I risk violating CRAN's policy when I engage in normal use of the targets package: "In principle, packages must pass R CMD check without warnings or significant notes to be admitted to the main CRAN package area. ".

I also do not believe that a warning is appropriate in this context, going by established canon about the use of warnings in packages; e.g.:

"Warnings fall somewhat in between errors and message, and typically indicate that something has gone wrong but the function has been able to at least partially recover." (https://adv-r.hadley.nz/conditions.html) but nothing has gone wrong here.
"avoid R’s warning feature. This is particularly important if you use R in production; when you regularly run R scripts as part of your business process. This is also important if you author R packages. Don’t issue warnings in your own code and treat warnings in 3rd party code as errors that are explicitly suppressed." (https://www.r-bloggers.com/2012/05/a-warning-about-warning/)
"We have adopted a policy to write R code in such a way, that warnings will not occur in the production environment. Hence, if in production a warning arises, this is always unexpected and should be treated as an error. This is achieved by setting options(warn = 2), which converts all warnings into errors (unexpected errors, to be precise)." (https://energizedanalytics.com/how-to-deal-with-errors-and-warnings-in-productive-r-code/)

Finally, the package proposes a way to suppress this warning, namely Sys.setenv(TAR_WARN = "false").. But there are two problems with this: firstly, this environmental variable does not exist until it is set to "false", so I cannot check its value in my package. Secondly, this environmental variable is redundant with options(warn=-1), thus introducing further unnecessary complexity and confusion.

Given that this package is maintained by ropensci, I hope you will give some thought around your policy for throwing unnecessary warnings in production R-code, and consider reimplementing this one (and potentially others).

Prework

[ v ] I understand and agree to help guide.
[ v ] I understand and agree to contributing guide.
[ v ] New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Answer 1 · 2024-01-27T08:10:42.000Z

Sorry for the rude/to the point message by the way! If I were less rushed, I would have written a more diplomatic message. I think targets is phenomenal and am a fellow open science enthusiast, and would love to have a productive discussion about this topic, because I do think these details matter. Have a good weekend!

Answer 2 · 2024-01-29T14:32:14.000Z

as you see here when a user calls a different function by the same name.

62ef565 uses getNamespace(package)$.__DEVTOOLS__ instead of static code analysis to look for pkgload::load_all(). (Edit: needed to drop that warning entirely due to issues in reverse dependencies.)

I also do not believe that a warning is appropriate in this context,

targets almost never throws warnings. There are a small number of situations where an error is too restrictive but a message is too weak. pkgload::load_all() is one of them.

this environmental variable does not exist until it is set to "false", so I cannot check its value in my package.

You can always check identical(Sys.getenv("TAR_WARN"), "") to see if TAR_WARN is set. Packages can use withr::local_envvar() and withr::with_envvar() to temporarily set environment variables inside function calls without permanently modifying the user's global state.

this environmental variable is redundant with options(warn=-1), thus introducing further unnecessary complexity and confusion.

options(warn=-1) affects all warnings, whereas TAR_WARN applies only to those set by targets. This granularity is useful.