`ply_select` doesn't work for grouped mutate
Opened this issue · 0 comments
jkeirstead commented
With dplyr, I often find myself using mutate
to calculate a item-level value using a grouped aggregate. For example:
flights %>%
group_by(year) %>%
mutate(mean_delay = mean(arr_delay),
std_delay = sd(arr_delay),
z_delay = (arr_delay - mean_delay)/std_delay)
From the docs, I thought that the first step of the pandas-ply equivalent would be:
(flights
.groupby('year')
.ply_select('*',
mean_delay = X.arr_delay.mean(),
std_delay = X.arr_delay.std())
)
But when I try this I get the following error:
Traceback (most recent call last):
File "<pyshell#17>", line 5, in <module>
sd = X.arr_delay.std()))
TypeError: _ply_select_for_groups() takes exactly 1 argument (4 given)
The problem appears to be the '*'
argument not working when ply_select
operates on a group.