matloff/TidyverseSkeptic

Don't need map() for summarizing models

Opened this issue · 2 comments

Aariq commented

mtcars %>%

This works fine:

mtcars %>% 
  group_by(cyl) %>% 
  summarize(r.sq = summary(lm(mpg ~ wt))$r.squared)

The problem here is not having to learn new paradigms just to do it, it's that you can't easily save intermediate steps because summarize wants the right hand side to be a vector, not a model object.

For example, the following code errors:

mtcars %>% 
  group_by(cyl) %>% 
  summarize(m = lm(mpg ~ wt))

And to get it to work, you have to start dealing with list-columns, which is a whole thing:

#this works, but makes a data frame with a list-column
mtcars %>% 
  group_by(cyl) %>% 
  summarize(m = list(lm(mpg ~ wt)))

This is to me beautiful, simple and logical. Split the dataset by cyl. For each part, run the following functions: run model, extract model fit, and finally bind it all together, while preserving name of splitted variable in output. Oh, and all the output one could need.

library(dplyr)
library(broom)
mtcars %>% 
    group_by(cyl) %>%
    group_map(.f=~lm(mpg ~ wt, data=.x) %>% glance()) %>%
    bind_rows(.id = "cyl")
#> # A tibble: 3 × 13
#>   cyl   r.squared adj.r…¹ sigma stati…² p.value    df logLik   AIC   BIC devia…³
#>   <chr>     <dbl>   <dbl> <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 1         0.509   0.454  3.33    9.32  0.0137     1 -27.7   61.5  62.7   99.9 
#> 2 2         0.465   0.357  1.17    4.34  0.0918     1  -9.83  25.7  25.5    6.79
#> 3 3         0.423   0.375  2.02    8.80  0.0118     1 -28.7   63.3  65.2   49.2 
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> #   variable names ¹​adj.r.squared, ²​statistic, ³​deviance

Created on 2022-08-06 by the reprex package (v2.0.1)

A point I make in the essay that intermediate steps are GOOD for beginning coders, the group who my essay focuses on.