duncantl/CodeDepends

Inconsistent detection of the magrittr dot (.) as an NSE variable

wlandau opened this issue · 3 comments

From ropensci/drake#320:

x <- CodeDepends::getInputs(quote(read_csv("data.csv") %>% mutate(.data = ., 
  foo = bar + baz)))
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
x@inputs
#> character(0)
x@nsevalVars
#> [1] "."   "bar" "baz"

x <- CodeDepends::getInputs(quote(raw_data %>% select(., starts_with("xyz"))))
x@inputs
#> [1] "raw_data"
x@nsevalVars
#> [1] "."

x <- CodeDepends::getInputs(quote(raw_data %>% filter(.)))
x@inputs
#> [1] "."        "raw_data"
x@nsevalVars
#> character(0)

x <- CodeDepends::getInputs(quote(raw_data %>% filter(complete.cases(.))))
x@inputs
#> [1] "."        "raw_data"
x@nsevalVars
#> character(0)

@wlandau

So filter has different semantics whether you are hitting the Bioconductor or dplyr version. The way that CodeDepends currently works (which is not quite perfectly correct but would cover the case here, I suspect) is that if dplyr is loaded previously in the script CodeDepends is aware of, it handles arguments to filter the way you would want for dplyr::filter. If not, it treats the filter call normally.

i.e.,

> expr = readScript(txt = "library(dplyr); raw_data %>% filter(.)")
> getInputs(expr)[[2]]@inputs
[1] "raw_data"
> getInputs(expr)[[2]]@nsevalVars
[1] "."

You're probably right that more generally across everything we might want . to always be marked as nse (or ignored entirely, to be honest). I can look at doing that but that will take a bit more doing.

How is the code getting into CodeDepends? Is drake tracking which libraries are loaded somehow, or does it have a full script to operate on rather than only individual expressions (even beyond this, just in general CodeDepends is going to behave better and be more powerful/correct when operating on the whole script).

Contrary to the established traditions of Make-like tools, drake focuses on the user's R session, not script files. It analyzes the commands in the workflow plan data frame and the "imported" functions loaded in your R session (or a custom environment you provide). If CodeDepends behaves differently depending on which packages are loaded, targets could be invalidated in unpredictable ways, which does not bode well for drake.