WillemSleegers/tidystats-v0.3

`add_stats()` first argument: model or results

Closed this issue · 2 comments

What should be the first argument for add_stats()?

Currently, the model output is the first argument. My initial reasoning for that was that it enables the user to pipe the output of an analysis directly into the tidystats list, like so:

results <- lm(extra ~ group, data = sleep) %>%
  add_stats(results, identifier = "M_sleep_lm")

results <- t.test(extra ~ group, data = sleep) %>%
  add_stats(results, identifier = "M_sleep_t_test")

However, as Ron Dotsch pointed out (https://twitter.com/RonDotsch/status/994120973623414784), why not use the results list as the first argument? That way you can first run your analyses, and later add them to your tidystats list, like so:

M_sleep_lm <- lm(extra ~ group, data = sleep)
M_sleep_t_test <- t.test(extra ~ group, data = sleep)

results <- results %>%
  add_stats(M_sleep_lm) %>%
  add_stats(M_sleep_t_test)

I think I'm favoring this latter option, for the following reasons:

  • You often want to store a model output in a variable, rather than piping it directly into add_stats(), because you might want to perform summary() on it, use it in predict(), calculate additional stats (e.g., confidence intervals), etc.
  • You do not need to supply an identifier, since with tidystats 0.2, it takes the variable name as the identifier, which won't work if you directly pipe the model output into add_stats().

I think I also prefer the second option. Then I can just have a long pipe of add_stats calls at the end of my script, which is neater than doing it throughout your script.

I made the switch. =)