Proposed solution to issue with poly() function
ray-p144 opened this issue · 0 comments
Please specify whether your issue is about:
- a possible bug
- a question about package functionality
- a suggested code or documentation change, improvement to the code, or feature request
Background
It seems that the requirement to use stats::poly()
rather than just poly()
in a model formula has been a persistent issue (#79 #133 #175, and in the prediction package)
Reproducible error:
set.seed(99)
library(margins)
data <- data.frame(y = rnorm(n = 100, mean = 0, sd = 1),
x = rnorm(n = 100, mean = 0, sd = 1))
fit <- lm(formula = y ~ poly(x, degree = 2), data = data)
margins(fit)
#> Error in poly(x, degree = 2, coefs = list(alpha = c(-0.087688783657271, : could not find function "poly"
fit2 <- lm(formula = y ~ stats::poly(x, degree = 2), data = data)
margins(fit2)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#> x
#> 0.04987
Proposed solution:
The solution that I would like to propose is to provide a reducemem
option for the various methods of the margins()
function.
I have debugged the error to be the result of this line of code where the ".Environment"
attribute of model[["terms"]]
is removed.
I propose that the reducemem
option be used to allow the user to turn this feature off so the formula will work as expected (not only for the stats::poly()
function, but any other user-defined functions they would like to have in the formula).
Example of working fix using assignInNamespace()
function:
fixedfunc <-
function(model,
data = find_data(model, parent.frame()),
variables = NULL,
at = NULL,
type = c("response", "link"),
vcov = stats::vcov(model),
vce = c("delta", "simulation", "bootstrap", "none"),
iterations = 50L, # if vce == "bootstrap" or "simulation"
unit_ses = FALSE,
eps = 1e-7,
reducemem = TRUE,
...) {
# match.arg()
type <- match.arg(type)
vce <- match.arg(vce)
# setup data
data_list <- build_datalist(data, at = at)
if (is.null(names(data_list))) {
names(data_list) <- NA_character_
}
at_specification <- attr(data_list, "at_specification")
# identify classes of terms in `model`
varslist <- find_terms_in_model(model, variables = variables)
# reduce memory profile
model[["model"]] <- NULL
## Potential solution
if (reducemem == TRUE) {
## Option 1
attr(model[["terms"]], ".Environment") <- NULL
## Option 2
# attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))
}
# calculate marginal effects
out <- list()
for (i in seq_along(data_list)) {
out[[i]] <- build_margins(model = model,
data = data_list[[i]],
variables = variables,
type = type,
vcov = vcov,
vce = vce,
iterations = iterations,
unit_ses = unit_ses,
eps = eps,
varslist = varslist,
...)
out[[i]][["_at_number"]] <- i
}
if (vce == "delta") {
jac <- do.call("rbind", lapply(out, attr, "jacobian"))
rownames(jac) <- paste0(rownames(jac), ".", rep(seq_len(length(out)), each = length(unique(rownames(jac)))))
vc <- jac %*% vcov %*% t(jac)
} else {
jac <- NULL
vc <- NULL
}
# return value
structure(do.call("rbind", out),
class = c("margins", "data.frame"),
at = if (is.null(at)) NULL else at_specification,
type = type,
call = if ("call" %in% names(model)) model[["call"]] else NULL,
model_class = class(model),
vce = vce,
vcov = vc,
jacobian = jac,
weighted = FALSE,
iterations = if (vce == "bootstrap") iterations else NULL)
}
bugfunc <- get("margins.lm", envir = asNamespace("margins"))
environment(fixedfunc) <- environment(bugfunc)
assignInNamespace("margins.lm",
value = fixedfunc,
ns = "margins")
margins(fit, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ poly(x, degree = 2), data = data)
#> x
#> 0.04987
margins(fit2, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#> x
#> 0.04987
Alternative option
An additional thought I had was that rather than setting the ".Environment"
attribute to NULL
, you can just create a new environment that has the same parents as the global environment by doing:
## Option 2
attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))
This would prevent creating a massive memory footprint if the user has a large dataset loaded into R, but still allow use of any packages that were attached prior to running the margins()
function.
Created on 2021-08-18 by the reprex package (v2.0.1)