groupby sum not allowing multidimensional grouping per default

Question

groupby sum not allowing multidimensional grouping per default

Closed this issue 3 months ago · 6 comments

Hi,

I was surprised with behavior of sum on LinearExpressionGroupby object. I want to check if that is to expect or it is a bug. If that is correct behavior then I would like to have it explained as I couldn't figure out it from documentation.

So what is going about. I have a variable where one of dimensions is time based (per date) and I need to set constraints on weekly level. So I tried to do it by making weekly groups along time dimension:

# I know that argument could be shorted to string 'time.week' but it emits a bunch of warnings  
grouped_per_week = vars.groupby(vars.coords['time'].dt.isocalendar().week)

The next thing I wanted to do is to perform sum over each group by all dimensions except for group dimension and one other dimension so I naturally performed this:


grouped_per_week.sum(['dim1', 'dim2', 'dim4'])

What I expected is to get LinearExpression with group and dim3 as dimensions and correct sums. But it doesn't work like that. What has worked is this:


grouped_per_week.sum().sum(['dim1', 'dim2', 'dim4'])

Notice one sum more that returns LinearExpression and then another sum that gives correct result.

Answer 1 · 2024-06-04T11:20:45.000Z

hey @aurelije , thanks as always for reporting technical issues. Can you try

grouped_per_week.sum(['dim1', 'dim2', 'dim4'], use_fallback=True)

Answer 2 · 2024-06-04T11:35:51.000Z

@FabianHofmann that gives me: TypeError: LinearExpressionGroupby.sum() got multiple values for argument 'use_fallback'
If I send list of dimensions as dim keyword parameter I am getting: TypeError: LinearExpressionGroupby.sum.<locals>.func() got an unexpected keyword argument 'dim'
I got same with dims keyword param

Answer 3 · 2024-06-04T12:22:41.000Z

Gotcha, I fear that the dims argument in the sum is not supported right now (even though it is just mapping to the xarray function when disabling fallback). If I see correctly, the only way to control the dims is to include them explicitly into the groupby argument, ie. use a dataarray with all dims in ['dim1', 'dim2', 'dim4'])

Answer 4 · 2024-06-04T12:42:24.000Z

@FabianHofmann so for now the only workaround is to use sum twice?

Answer 5 · 2024-06-05T14:47:56.000Z

no, not necessarily. see for example:

from linopy import Model
import pandas as pd

m = Model()
z = m.add_variables(0, pd.DataFrame([[1, 2], [3, 4], [5, 6]]).T, name="z")

expr = 1 * z
groups = xr.DataArray([[1,1,2], [1,3,3]], coords=z.coords)
grouped = expr.groupby(groups).sum(use_fallback=True)

Answer 6 · 2024-07-05T15:16:32.000Z

closing this, please reopen if needed