hrbrmstr/rdaradar

Idea: check for overwritten values instead of using a blocklist

Opened this issue · 1 comments

klmr commented

The functions currently listed in dangerous_functions are probably the most insidious. But in principle any (especially commonly used) predefined functions have the same issue, e.g. <-, [, etc. ls() is also a good candidate: personally I basically never use load() but when I do I immediately follow it by running ls(), even though I know that the verbose = TRUE flag would give me the same (it’s just a compulsion I guess).

An alternative approach to consider would be to check whether any loaded name shadows or overwrites an existing name (rather than hard-code the list of known-dangerous functions). I played around with this idea in a toy ‘safeload’ package. The basic logic is as follows:

new_names = load(args[1], envir = quarantine)

sentinel = new.env()  # (1)
hook_funs = list2env(list(.Last = {}, .Last.sys = {}, parent = .GlobalEnv)  # (2)
global_values = mget_uneval(new_names, hook_funs, ifnotfound = list(sentinel), inherits = TRUE)  # (3)
ok = vapply(global_values, identical, logical(1L), sentinel)

if (any(! ok)) {…}

Notes:

  1. new.env() gives us a value that should be (!) non-identical to any existing value.
  2. Adding this environment to the search ensures that .Last() and .Sys.last() are found, even if the user hasn’t currently defined them.
  3. mget_uneval() is like mget() but without evaluating active bindings.
  4. Another difference / potential limitation is the fact that this only checks for names predefined in standard packages, not for whatever else the user might have attached or defined in .GlobalEnv, since it’s run from a standalone, containerised script. A savvy attacker might make an educated guess about other things the user has already defined in their environment, or about packages they are likely to use. For instance, hijacking the names select(), ggplot() etc. has good chances of success.

This is a much better approach than my hastily crafted one.