groupby sum not allowing multidimensional grouping per default
Closed this issue · 6 comments
Hi,
I was surprised with behavior of sum on LinearExpressionGroupby object. I want to check if that is to expect or it is a bug. If that is correct behavior then I would like to have it explained as I couldn't figure out it from documentation.
So what is going about. I have a variable where one of dimensions is time based (per date) and I need to set constraints on weekly level. So I tried to do it by making weekly groups along time dimension:
# I know that argument could be shorted to string 'time.week' but it emits a bunch of warnings
grouped_per_week = vars.groupby(vars.coords['time'].dt.isocalendar().week)
The next thing I wanted to do is to perform sum over each group by all dimensions except for group dimension and one other dimension so I naturally performed this:
grouped_per_week.sum(['dim1', 'dim2', 'dim4'])
What I expected is to get LinearExpression with group
and dim3
as dimensions and correct sums. But it doesn't work like that. What has worked is this:
grouped_per_week.sum().sum(['dim1', 'dim2', 'dim4'])
Notice one sum more that returns LinearExpression and then another sum that gives correct result.
hey @aurelije , thanks as always for reporting technical issues. Can you try
grouped_per_week.sum(['dim1', 'dim2', 'dim4'], use_fallback=True)
@FabianHofmann that gives me: TypeError: LinearExpressionGroupby.sum() got multiple values for argument 'use_fallback'
If I send list of dimensions as dim keyword parameter I am getting: TypeError: LinearExpressionGroupby.sum.<locals>.func() got an unexpected keyword argument 'dim'
I got same with dims
keyword param
Gotcha, I fear that the dims argument in the sum is not supported right now (even though it is just mapping to the xarray function when disabling fallback). If I see correctly, the only way to control the dims is to include them explicitly into the groupby
argument, ie. use a dataarray with all dims in ['dim1', 'dim2', 'dim4']
)
@FabianHofmann so for now the only workaround is to use sum twice?
no, not necessarily. see for example:
from linopy import Model
import pandas as pd
m = Model()
z = m.add_variables(0, pd.DataFrame([[1, 2], [3, 4], [5, 6]]).T, name="z")
expr = 1 * z
groups = xr.DataArray([[1,1,2], [1,3,3]], coords=z.coords)
grouped = expr.groupby(groups).sum(use_fallback=True)
closing this, please reopen if needed