prediction fails with poly() in formula
leeper opened this issue · 5 comments
leeper commented
> x <- glm(Sepal.Length ~ stats::poly(Sepal.Width, 3), data = iris)
> prediction(x)
Average prediction for 150 observations: 5.8433
> prediction(x, at = list(Sepal.Width = 2:4))
Error in stats::poly(Sepal.Width, 3) :
'degree' must be less than number of unique points
> traceback()
11: stop("'degree' must be less than number of unique points")
10: stats::poly(Sepal.Width, 3)
9: eval(predvars, data, env)
8: eval(predvars, data, env)
7: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)
6: model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
5: predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type ==
"link", "response", type), terms = terms, na.action = na.action)
4: predict.glm(model, newdata = out, type = type, se.fit = TRUE,
...)
3: predict(model, newdata = out, type = type, se.fit = TRUE, ...)
2: prediction.glm(x, at = list(Sepal.Width = 2:4))
1: prediction(x, at = list(Sepal.Width = 2:4))
leeper commented
The underlying issue here is that predict()
gives different values when the model uses stats::poly()
versus just poly()
:
require("stats")
data(cars)
# model specified with `poly()`
m <- lm(dist ~ stats::poly(speed, 3), data = cars)
# model specified using modified terms
m2 <- lm(dist ~ speed + I(speed^2) + I(speed^3), data = cars)
# predictions are equivalent
all.equal(predict(m), predict(m2))
# but predictions with new data are not
all.equal(
predict(m, newdata = data.frame(speed = 4:9)),
predict(m2, newdata = data.frame(speed = 4:9))
)
# [1] "Mean relative difference: 0.8405171"
predict(m, newdata = data.frame(speed = 4:9))
# 1 2 3 4 5 6
# -36.600687 -4.520996 19.660440 46.227212 85.462909 147.651122
predict(m, newdata = data.frame(speed = 5:10))
# 1 2 3 4 5 6
# -36.600687 -4.520996 19.660440 46.227212 85.462909 147.651122
predict(m2, newdata = data.frame(speed = 4:9))
# 1 2 3 4 5 6
# 2.760981 7.040541 10.928348 14.485912 17.774747 20.856365
predict(m2, newdata = data.frame(speed = 5:10))
# 1 2 3 4 5 6
# 7.040541 10.928348 14.485912 17.774747 20.856365 23.792277
predict(lm(dist ~ poly(speed, 3), data = cars), newdata = data.frame(speed = 4:9))
# 1 2 3 4 5 6
# 2.760981 7.040541 10.928348 14.485912 17.774747 20.856365
predict(lm(dist ~ poly(speed, 3), data = cars), newdata = data.frame(speed = 5:10))
# 1 2 3 4 5 6
# 7.040541 10.928348 14.485912 17.774747 20.856365 23.792277
If I can get prediction()
to not encounter the "poly not found" error, then the above shouldn't be an issue.
leeper commented
Here's the specific error:
> prediction::prediction(glm(Sepal.Length ~ stats::poly(Sepal.Width, 3), data = iris))
Average prediction for 150 observations: 5.8433
> prediction::prediction(glm(Sepal.Length ~ poly(Sepal.Width, 3), data = iris))
Error in poly(Sepal.Width, 3, coefs = list(alpha = c(3.05733333333333, :
could not find function "poly"
> traceback()
9: eval(predvars, data, env)
8: eval(predvars, data, env)
7: model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)
6: model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
5: predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type ==
"link", "response", type), terms = terms, na.action = na.action)
4: predict.glm(model, newdata = out, type = type, se.fit = TRUE,
...)
3: predict(model, newdata = out, type = type, se.fit = TRUE, ...)
2: prediction.glm(glm(Sepal.Length ~ poly(Sepal.Width, 3), data = iris))
1: prediction::prediction(glm(Sepal.Length ~ poly(Sepal.Width, 3),
data = iris))
leeper commented
Probably the bug discussed here: https://stat.ethz.ch/pipermail/r-devel/2018-April/075889.html
leeper commented
Underlying issues in R's safe prediction patched here: wch/r-source@e372854
leeper commented
There were two issues here. One was that poly()
was misbehaving in R itself. That's fixed. Using poly()
rather than stats::poly()
still fails. That's at least just an annoyance rather than a bug.