leeper/margins

Proposed solution to issue with poly() function

ray-p144 opened this issue · 0 comments

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Background

It seems that the requirement to use stats::poly() rather than just poly() in a model formula has been a persistent issue (#79 #133 #175, and in the prediction package)

Reproducible error:

set.seed(99)

library(margins)

data <- data.frame(y = rnorm(n = 100, mean = 0, sd = 1),
                   x = rnorm(n = 100, mean = 0, sd = 1))

fit <- lm(formula = y ~ poly(x, degree = 2), data = data)
margins(fit)
#> Error in poly(x, degree = 2, coefs = list(alpha = c(-0.087688783657271, : could not find function "poly"


fit2 <- lm(formula = y ~ stats::poly(x, degree = 2), data = data)
margins(fit2)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#>        x
#>  0.04987

Proposed solution:

The solution that I would like to propose is to provide a reducemem option for the various methods of the margins() function.

I have debugged the error to be the result of this line of code where the ".Environment" attribute of model[["terms"]] is removed.

I propose that the reducemem option be used to allow the user to turn this feature off so the formula will work as expected (not only for the stats::poly() function, but any other user-defined functions they would like to have in the formula).

Example of working fix using assignInNamespace() function:

fixedfunc <-
    function(model, 
             data = find_data(model, parent.frame()), 
             variables = NULL,
             at = NULL, 
             type = c("response", "link"),
             vcov = stats::vcov(model),
             vce = c("delta", "simulation", "bootstrap", "none"),
             iterations = 50L, # if vce == "bootstrap" or "simulation"
             unit_ses = FALSE,
             eps = 1e-7,
             reducemem = TRUE,
             ...) {
        
        # match.arg()
        type <- match.arg(type)
        vce <- match.arg(vce)
        
        # setup data
        data_list <- build_datalist(data, at = at)
        if (is.null(names(data_list))) {
            names(data_list) <- NA_character_
        }
        at_specification <- attr(data_list, "at_specification")
        
        # identify classes of terms in `model`
        varslist <- find_terms_in_model(model, variables = variables)
        
        # reduce memory profile
        model[["model"]] <- NULL
        
        ## Potential solution
        if (reducemem == TRUE) {
            ## Option 1
            attr(model[["terms"]], ".Environment") <- NULL
            ## Option 2
            # attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))
        }
        
        # calculate marginal effects
        out <- list()
        for (i in seq_along(data_list)) {
            out[[i]] <- build_margins(model = model, 
                                      data = data_list[[i]], 
                                      variables = variables,
                                      type = type, 
                                      vcov = vcov, 
                                      vce = vce, 
                                      iterations = iterations, 
                                      unit_ses = unit_ses, 
                                      eps = eps,
                                      varslist = varslist,
                                      ...)
            out[[i]][["_at_number"]] <- i
        }
        if (vce == "delta") {
            jac <- do.call("rbind", lapply(out, attr, "jacobian"))
            rownames(jac) <- paste0(rownames(jac), ".", rep(seq_len(length(out)), each = length(unique(rownames(jac)))))
            vc <- jac %*% vcov %*% t(jac)
        } else {
            jac <- NULL
            vc <- NULL
        }
        
        # return value
        structure(do.call("rbind", out), 
                  class = c("margins", "data.frame"), 
                  at = if (is.null(at)) NULL else at_specification,
                  type = type,
                  call = if ("call" %in% names(model)) model[["call"]] else NULL,
                  model_class = class(model),
                  vce = vce, 
                  vcov = vc,
                  jacobian = jac,
                  weighted = FALSE,
                  iterations = if (vce == "bootstrap") iterations else NULL)
    }

bugfunc <- get("margins.lm", envir = asNamespace("margins"))
environment(fixedfunc) <- environment(bugfunc)

assignInNamespace("margins.lm",
                  value = fixedfunc,
                  ns = "margins")

margins(fit, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ poly(x, degree = 2), data = data)
#>        x
#>  0.04987


margins(fit2, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#>        x
#>  0.04987

Alternative option

An additional thought I had was that rather than setting the ".Environment" attribute to NULL, you can just create a new environment that has the same parents as the global environment by doing:

## Option 2
attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))

This would prevent creating a massive memory footprint if the user has a large dataset loaded into R, but still allow use of any packages that were attached prior to running the margins() function.

Created on 2021-08-18 by the reprex package (v2.0.1)