Group_by bug with purrr::reduce
sousaru opened this issue · 4 comments
I was having a bug while using purrr::reduce
to compute lags in a data.frame and realized it came down to the use of tidytable::group_by
.
I provide a MRE:
data = base::data.frame(x = 1:10, y = 1:10, w = c(rep(1,5),rep(0,5)))
#for variable naming
my_sign_adj = function(k) paste0(base::ifelse(k>=0,'p','n'),abs(k))
#function to compute lags or leads
my_lead = function(x,n) {
#should be grouped already if needed
if(n>0) { return(dplyr::lead(x,n)) } else { return(dplyr::lag(x,-1*n)) }
}
#Calculate multiple lags/leads
calculate_lags = function(df, var, lags){
#should be grouped already if needed
#lags are negative
#leads are positive
#function mapping for each
map_lead = purrr::map(lags, ~ purrr::partial(my_lead, n = .x))
#adjust df
df = dplyr::mutate(df, across(.cols = {{var}},
.fns = map_lead,
.names = "{.col}_t{my_sign_adj(lags)}"))
return(df)
}
purrr::reduce(c('x','y')
,calculate_lags
,1:3
,.init = tidytable::group_by(data, w)
)
# # A tidytable: 10 × 9
# # Groups:
# x y w x_tp1 x_tp2 x_tp3 y_tp1 y_tp2 y_tp3
# <int> <int> <dbl> <int> <int> <int> <int> <int> <int>
# 1 1 1 1 2 3 4 2 3 4
# 2 2 2 1 3 4 5 3 4 5
# 3 3 3 1 4 5 6 4 5 6
# 4 4 4 1 5 6 7 5 6 7
# 5 5 5 1 6 7 8 6 7 8
# 6 6 6 0 7 8 9 7 8 9
# 7 7 7 0 8 9 10 8 9 10
# 8 8 8 0 9 10 NA 9 10 NA
# 9 9 9 0 10 NA NA 10 NA NA
# 10 10 10 0 NA NA NA NA NA NA
purrr::reduce(c('x','y')
,calculate_lags
,1:3
,.init = dplyr::group_by(data, w)
)
# # A tibble: 10 × 9
# # Groups: w [2]
# x y w x_tp1 x_tp2 x_tp3 y_tp1 y_tp2 y_tp3
# <int> <int> <dbl> <int> <int> <int> <int> <int> <int>
# 1 1 1 1 2 3 4 2 3 4
# 2 2 2 1 3 4 5 3 4 5
# 3 3 3 1 4 5 NA 4 5 NA
# 4 4 4 1 5 NA NA 5 NA NA
# 5 5 5 1 NA NA NA NA NA NA
# 6 6 6 0 7 8 9 7 8 9
# 7 7 7 0 8 9 10 8 9 10
# 8 8 8 0 9 10 NA 9 10 NA
# 9 9 9 0 10 NA NA 10 NA NA
# 10 10 10 0 NA NA NA NA NA NA
Hmm it looks like the issue isn't tidytable::group_by()
and it isn't purrr:reduce()
either.
Part of the issue is in calculate_lags()
you're using dplyr::mutate()
/dplyr::across()
on a grouped tidytable - which doesn't work.
library(tidytable)
df <- data.frame(x = 1:3, y = c("a", "a", "b"))
df %>%
tidytable::group_by(y) %>%
tidytable::mutate(x_lag_correct = lag(x)) %>%
dplyr::mutate(x_lag_wrong = lag(x))
#> # A tidytable: 3 × 4
#> # Groups:
#> x y x_lag_correct x_lag_wrong
#> <int> <chr> <int> <int>
#> 1 1 a NA NA
#> 2 2 a 1 1
#> 3 3 b NA 2
However if you change this to the tidytable versions you would run into another issue - you pass map_lead
to across()
, where map_lead
is a list of functions you are dynamically creating. Unfortunately that doesn't work in tidytable::across()
, and there is no way to make it work given the translation constraints of tidytable::across()
.
Thanks! Should have added the fact that using tidytable for across
/mutate
yields a different error.
map_lead
creates a list of purr-style lambda functions, which I thought worked with tidytable::mutate
(across(...))`. Could you elaborate on why it does not? Is this flagged in the documentation?
Thanks for your time!
A list()
call works, but a list passed as a variable doesn't.
library(tidytable)
df <- tidytable(x = 1:3, y = 4:6)
# Works
df %>%
summarize(
across(c(x, y), list(mean, sum))
)
#> # A tidytable: 1 × 4
#> x_1 x_2 y_1 y_2
#> <dbl> <int> <dbl> <int>
#> 1 2 6 5 15
# Fails
funs <- list(mean, sum)
df %>%
summarize(
across(c(x, y), funs)
)
#> Error in funs(x): could not find function "funs"
I think you're right I need to document that difference somewhere.
Could you elaborate on why it does not?
Essentially the idea of tidytable::across()
is it gets expanded to a regular function call - no evaluation occurs until the expansion is completed.
So this...
df %>%
summarize(across(c(x, y), mean))
becomes this...
df %>%
summarize(x = mean(x),
y = mean(y))
To get this to work you have to account for 3 cases
.fns = mean
assumes a function name and gets converted tomean(col)
.fns = ~ mean(.x)
replaces the.x
and gets converted tomean(col)
.fns = list(mean, max)
combines the assumptions from the other 2 - either straight function conversions or converting a lambda function.
Since I'm directly altering the expressions in the background it's tough to know what the intent is when passing a list of functions.