Don't need map() for summarizing models
Opened this issue · 2 comments
Line 641 in a8ad4dd
This works fine:
mtcars %>%
group_by(cyl) %>%
summarize(r.sq = summary(lm(mpg ~ wt))$r.squared)
The problem here is not having to learn new paradigms just to do it, it's that you can't easily save intermediate steps because summarize wants the right hand side to be a vector, not a model object.
For example, the following code errors:
mtcars %>%
group_by(cyl) %>%
summarize(m = lm(mpg ~ wt))
And to get it to work, you have to start dealing with list-columns, which is a whole thing:
#this works, but makes a data frame with a list-column
mtcars %>%
group_by(cyl) %>%
summarize(m = list(lm(mpg ~ wt)))
This is to me beautiful, simple and logical. Split the dataset by cyl. For each part, run the following functions: run model, extract model fit, and finally bind it all together, while preserving name of splitted variable in output. Oh, and all the output one could need.
library(dplyr)
library(broom)
mtcars %>%
group_by(cyl) %>%
group_map(.f=~lm(mpg ~ wt, data=.x) %>% glance()) %>%
bind_rows(.id = "cyl")
#> # A tibble: 3 × 13
#> cyl r.squared adj.r…¹ sigma stati…² p.value df logLik AIC BIC devia…³
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.509 0.454 3.33 9.32 0.0137 1 -27.7 61.5 62.7 99.9
#> 2 2 0.465 0.357 1.17 4.34 0.0918 1 -9.83 25.7 25.5 6.79
#> 3 3 0.423 0.375 2.02 8.80 0.0118 1 -28.7 63.3 65.2 49.2
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> # variable names ¹adj.r.squared, ²statistic, ³deviance
Created on 2022-08-06 by the reprex package (v2.0.1)
A point I make in the essay that intermediate steps are GOOD for beginning coders, the group who my essay focuses on.