`.by` in fmutate and fsummarize arguments?
kylebutts opened this issue · 1 comments
Hi Sebastian,
One new thing that dplyr has added that I love is the ability to pass .by
in mutate/summarize/filter
(skipping group_by()
and ungroup()
). Is this a feature that could be added to f...
functions?
For example, using masking now results in different results.
library(tidyverse)
mtcars |>
subset(mpg > 11) |>
summarise(
across(c(mpg, carb, hp), mean),
qsec_wt = weighted.mean(qsec, wt),
.by = c("cyl", "vs", "am")
)
#> cyl vs am mpg carb hp qsec_wt
#> 1 6 0 1 20.56667 4.666667 131.66667 16.33306
#> 2 4 1 1 28.37143 1.428571 80.57143 18.75509
#> 3 6 1 0 19.12500 2.500000 115.25000 19.21275
#> 4 8 0 0 15.98000 2.900000 191.00000 17.01239
#> 5 4 1 0 22.90000 1.666667 84.66667 21.04028
#> 6 4 0 1 26.00000 2.000000 91.00000 16.70000
#> 7 8 0 1 15.40000 6.000000 299.50000 14.55297
library(collapse)
#> collapse 2.0.9, see ?`collapse-package` or ?`collapse-documentation`
set_collapse(mask = "manip")
mtcars |>
subset(mpg > 11) |>
summarise(
across(c(mpg, carb, hp), mean),
qsec_wt = weighted.mean(qsec, wt),
.by = c("cyl", "vs", "am")
)
#> mpg carb hp qsec_wt .by
#> 1 20.73667 2.733333 142.4667 17.74035 cyl
#> 2 20.73667 2.733333 142.4667 17.74035 vs
#> 3 20.73667 2.733333 142.4667 17.74035 am
Created on 2024-02-22 with reprex v2.1.0
Hi, I understand the impulse, but I don't think this is very useful to collapse. In fsummarise()
there is no regrouping, and fgroup_by()
does not do more than required, so when using the Fast Statistical Functions,
mtcars |>
subset(mpg > 11) |>
group_by(cyl, vs, am) |>
summarise(
across(c(mpg, carb, hp), fmean),
qsec_wt = fmean(qsec, wt)
)
is as efficient as the .by
solution. There is also collap(~ cyl + vs + am, w = ~wt, custom = list(fmean_uw = .c(mpg, carb, hp), fmean = "qsec"), keep.col.order = FALSE)
. With fmutate()
, you have a generalization of the .by
behavior through the g arguments to fast statistical functions, e.g.
mtcars |>
mutate(across(c(mpg, carb, hp), fmean, list(cyl, vs, am), TRA = "fill"))
This of course makes it repetitive to compute multiple expressions with the same grouping (and in that case fgroup_by()
and fungroup()
would make sense), but on the other hand you can use different groupings in the same fmutate()/ftransform()
call, or even in a single expression. For example, observing country-sector level trade, you can compute revealed comparative advantage on one line:
exports = data.frame(c = rep(1:10, each = 10),
s = rep(1:10, 10),
v = abs(rnorm(100)))
exports |>
mutate(rca = fsum(v, c, TRA = "/") / fsum(v, s, TRA = "/"))
Thus I think the absense of regrouping in fsummarise()
, the availability of collap()
, and the incorporation of grouping and transformations (including transformation by reference using set = TRUE
) into Fast Statistical Functions make this feature redundant in collapse.