Lab 6 - Error with Predict
AprilPeck opened this issue · 6 comments
@Dselby86
I am trying to do question 3d using the Predict function:
data2 <- with(data,
data.frame(Democrats = mean(data$Democrats),
Evangelics = mean(data$Evangelics),
Catholics = mean(data$Catholics),
Media = quantile(data$Media, .95),
Merck = mean(data$Merck)))
data2$Adopt_prob <- predict(log, newdata = data2, type = "response")
But keep getting the following error:
Error in `$<-.data.frame`(`*tmp*`, Adopt_prob, value = c(`1` = 0.411174458260132, : replacement has 49 rows, data has 1
My data2 data frame looks good, but it seems like the predict function is trying to return too many rows.
What does your model look like?
I would try to simplify as much as possible to see if you can diagnose the problem yourself.
I'm not sure what purpose with() is serving in your data frame construction since the data.frame() function is used to build a new data frame:
data2 <-
data.frame(
Democrats = mean(data$Democrats),
Evangelics = mean(data$Evangelics),
Catholics = mean(data$Catholics),
Media = quantile(data$Media, .95),
Merck = mean(data$Merck)
)
I suspect it might have impacted the object. With your previous code what did the following return?
library( dplyr )
library( pander )
class( data2 )
dim( data2 )
data2 %>% pander()
Do those change?
@lecy It's giving me the same error, even without the "with" function. (I wasn't sure why it was there either, but it was in the lab sample code and I was trying everything I could think of.)
Class = data.frame
dim = 1 5
data2:
I can just use the formula for this question, but run into the same problem with q3f.
What is your model ?
Got it - the problem is the variable names.
The predict() function will match coefficient names from the model with variable (column) names from the new dataset in order to create the y-hat value.
You should be using the first version of the glm() here where you use the variable names directly and tell it which data frame you are using (data=dat).
Otherwise in the model your variables will be named data$Democrat instead of Democrat, etc.
m <- glm( Adoption ~ Democrats + Evangelics + Catholics + Media + Merck, data=dat, family="binomial" )
m2 <- glm( dat$Adoption ~ dat$Democrats + dat$Evangelics + dat$Catholics + dat$Media + dat$Merck, family="binomial" )
> summary( m )
Call:
glm(formula = Adoption ~ Democrats + Evangelics + Catholics +
Media + Merck, family = "binomial", data = dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8892 -0.5959 -0.2235 0.4907 2.4277
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.7176493 2.9623236 2.605 0.00918 **
Democrats -0.7160462 1.9068860 -0.376 0.70728
Evangelics -6.0438723 2.6817890 -2.254 0.02422 *
Catholics 1.5925736 2.6708639 0.596 0.55099
Media -0.0151480 0.0047374 -3.198 0.00139 **
Merck -0.0002314 0.0003379 -0.685 0.49348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.906 on 48 degrees of freedom
Residual deviance: 37.604 on 43 degrees of freedom
AIC: 49.604
Number of Fisher Scoring iterations: 6
> summary( m2 )
Call:
glm(formula = dat$Adoption ~ dat$Democrats + dat$Evangelics +
dat$Catholics + dat$Media + dat$Merck, family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8892 -0.5959 -0.2235 0.4907 2.4277
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.7176493 2.9623236 2.605 0.00918 **
dat$Democrats -0.7160462 1.9068860 -0.376 0.70728
dat$Evangelics -6.0438723 2.6817890 -2.254 0.02422 *
dat$Catholics 1.5925736 2.6708639 0.596 0.55099
dat$Media -0.0151480 0.0047374 -3.198 0.00139 **
dat$Merck -0.0002314 0.0003379 -0.685 0.49348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.906 on 48 degrees of freedom
Residual deviance: 37.604 on 43 degrees of freedom
AIC: 49.604
Number of Fisher Scoring iterations: 6