coursera/pandas-ply

`ply_select` doesn't work for grouped mutate

Opened this issue · 0 comments

With dplyr, I often find myself using mutate to calculate a item-level value using a grouped aggregate. For example:

flights %>%
  group_by(year) %>%
  mutate(mean_delay = mean(arr_delay),
         std_delay = sd(arr_delay),
         z_delay = (arr_delay - mean_delay)/std_delay)

From the docs, I thought that the first step of the pandas-ply equivalent would be:

(flights
  .groupby('year')
  .ply_select('*',
    mean_delay = X.arr_delay.mean(),
    std_delay = X.arr_delay.std())
)

But when I try this I get the following error:

Traceback (most recent call last):
  File "<pyshell#17>", line 5, in <module>
    sd = X.arr_delay.std()))
TypeError: _ply_select_for_groups() takes exactly 1 argument (4 given)

The problem appears to be the '*' argument not working when ply_select operates on a group.