StatsGary/OddsPlotty

glmnet not supported

jkylearmstrong opened this issue · 7 comments

library('OddsPlotty')
library('glmnet')

library('mlbench')

data("BreastCancer", package = "mlbench")
#Use complete cases of breast cancer
breast <- BreastCancer[complete.cases(BreastCancer), ] #Create a copy
breast <- breast[, -1]
head(breast, 10)
#Convert the class to a factor - Beningn (0) and Malignant (1)
breast$Class <- factor(breast$Class)
str(breast)

for(i in 1:9) {
  breast[, i] <- as.numeric(as.character(breast[, i]))
}
#Loops through the first columns - 1 to 9 and changes them from factors to a numerical representation
str(breast)

library('caret')

glm_model <- caret::train(Class ~ .,
                          data = breast,
                          method = "glmnet",
                          family = "binomial")

OddsPlotty::odds_plot(glm_model$finalModel,
                      title = "Odds Plot",
                      subtitle = "Showing odds of cancer based on various factors")
#> Error in UseMethod("vcov") :  no applicable method for 'vcov' applied to an object of class "c('lognet', 'glmnet')"

Yes, I just checked the CRAN version downloading it and you need to make sure that when you train your model you use caret. Plus, the method="glmnet" should be just "glm".

glmnet is a logistic regression model - this example will compute the odds ratio and errors:

glm_model$finalModel %>%
  broom::tidy() %>%
  group_by(term) %>%
  summarise(
    mean_est = mean(estimate , na.rm=TRUE),
    sd = sd(estimate , na.rm = TRUE),
    .groups = 'keep') %>%
  ungroup() %>%
  mutate(Odds_Ratio = if_else(term != '(Intercept)', exp(mean_est), as.numeric(NA))) %>%
  ggplot(aes(x= Odds_Ratio , y= term)) + 
  geom_point() +
  geom_errorbarh(aes(xmax = exp(mean_est + sd), xmin = exp(mean_est - sd), height = .2)) + 
  ggtitle(expression("Comparing model estimates of Odds Ratios" ~ exp(beta[i])))

There is also an issue with method = "bayesglm"

glm_model <- caret::train(Class ~ .,
                   data = breast,
                   method = "bayesglm") 

the broom package provides a warning:

glm_model$finalModel %>% 
  broom::tidy() %>%
  mutate(Odds_Ratio = if_else(term != '(Intercept)', exp(estimate), as.numeric(NA))) %>%
  ggplot(aes(x= Odds_Ratio , y= term)) + 
  geom_point() +
  geom_errorbarh(aes(xmax = exp(estimate + std.error), xmin = exp(estimate - std.error), height = .2)) + 
  ggtitle(expression("Comparing model estimates of Odds Ratios" ~ exp(beta[i])))

but if we perform the computation by grabbing all of the elements things look the same:

tibble(
  term = names(glm_model$finalModel$coefficients),
  mean_est = glm_model$finalModel$coefficients,
  sd = glm_model[["finalModel"]][["prior.sd"]]
  ) %>%
  mutate(Odds_Ratio = if_else(term != '(Intercept)', exp(mean_est), as.numeric(NA))) %>%
  ggplot(aes(x= Odds_Ratio , y= term)) + 
  geom_point() +
  geom_errorbarh(aes(xmax = exp(mean_est + sd), xmin = exp(mean_est - sd), height = .2)) + 
  ggtitle(expression("Comparing model estimates of Odds Ratios" ~ exp(beta[i])))

@jkylearmstrong the glmnet bootstrapping confidence intervals are not correct for this function. I think by averaging the standard error across each interval is incorrect. I have managed to fix the bayesglm issues, but the confidence limits look very different to the logistic regression:
image