pydata/xarray

Resample with default argument

Closed this issue · 6 comments

huard commented

Code Sample, a copy-pastable example if possible

time = pd.date_range('2000-01-01', freq='D', periods=365 * 3)
ds = xr.Dataset({'foo': ('time', np.arange(365 * 3)), 'time': time})
ds.foo.resample(time=None)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-d7109c181d10> in <module>()
----> 1 ds.foo.resample(time=None)

/home/david/src/anaconda3/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs, **indexer)
    678                             "was passed %r" % dim)
    679         group = DataArray(dim, [(dim.dims, dim)], name=RESAMPLE_DIM)
--> 680         grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
    681         resampler = self._resample_cls(self, group=group, dim=dim_name,
    682                                        grouper=grouper,

TypeError: __init__() got an unexpected keyword argument 'base'

Problem description

Although None is the default value (0v.10.6) for freq, actually using None as the freq raises an error.

Expected Output

I would like resample(time=None) to return ds.foo itself, or a DataArrayResample instance that includes the entire array.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-30-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: None.None

xarray: 0.10.6
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: None
seaborn: 0.8.1
setuptools: 39.2.0
pip: 9.0.3
conda: 4.4.11
pytest: 3.6.0
IPython: 5.7.0
sphinx: 1.7.4

I got the same error when I resampled a dataset. Xarray was installing in an environment through pip under python 3.8 :

Python 3.8.18 (default, Sep 11 2023, 13:40:15) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: xr.__version__
Out[4]: '2023.1.0'

In [5]: time = pd.date_range('2000-01-01', freq='H', periods=125)

In [6]: ds = xr.Dataset({'foo': ('time', np.arange(125)), 'time': time})

In [7]: ds.resample({'time':'6H'}, offset='0H').max()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 ds.resample({'time':'6H'}, offset='0H').max()

File ~/anaconda3/envs/test/lib/python3.8/site-packages/xarray/core/dataset.py:9224, in Dataset.resample(self, indexer, skipna, closed, label, base, offset, origin, keep_attrs, loffset, restore_coord_dims, **indexer_kwargs)
   9161 """Returns a Resample object for performing resampling operations.
   9162 
   9163 Handles both downsampling and upsampling. The resampled
   (...)
   9220 .. [1] http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
   9221 """
   9222 from xarray.core.resample import DatasetResample
-> 9224 return self._resample(
   9225     resample_cls=DatasetResample,
   9226     indexer=indexer,
   9227     skipna=skipna,
   9228     closed=closed,
   9229     label=label,
   9230     base=base,
   9231     offset=offset,
   9232     origin=origin,
   9233     keep_attrs=keep_attrs,
   9234     loffset=loffset,
   9235     restore_coord_dims=restore_coord_dims,
   9236     **indexer_kwargs,
   9237 )

File ~/anaconda3/envs/test/lib/python3.8/site-packages/xarray/core/common.py:993, in DataWithCoords._resample(self, resample_cls, indexer, skipna, closed, label, base, offset, origin, keep_attrs, loffset, restore_coord_dims, **indexer_kwargs)
    983         grouper = CFTimeGrouper(
    984             freq=freq,
    985             closed=closed,
   (...)
    990             offset=offset,
    991         )
    992     else:
--> 993         grouper = pd.Grouper(
    994             freq=freq,
    995             closed=closed,
    996             label=label,
    997             base=base,
    998             offset=offset,
    999             origin=origin,
   1000             loffset=loffset,
   1001         )
   1002 group = DataArray(
   1003     dim_coord, coords=dim_coord.coords, dims=dim_coord.dims, name=RESAMPLE_DIM
   1004 )
   1005 return resample_cls(
   1006     self,
   1007     group=group,
   (...)
   1011     restore_coord_dims=restore_coord_dims,
   1012 )

File ~/anaconda3/envs/test/lib/python3.8/site-packages/pandas/core/resample.py:1663, in TimeGrouper.__init__(self, freq, closed, label, how, axis, fill_method, limit, kind, convention, origin, offset, group_keys, **kwargs)
   1660 # always sort time groupers
   1661 kwargs["sort"] = True
-> 1663 super().__init__(freq=freq, axis=axis, **kwargs)

TypeError: __init__() got an unexpected keyword argument 'base'

In [8]: xr.__version__
Out[8]: '2023.1.0'

In [9]: pd.__version__
Out[9]: '2.0.3'

What is weird is that in an environment created using conda, I don't get this issue :

Python 3.8.18 (default, Sep 11 2023, 13:40:15) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: time = pd.date_range('2000-01-01', freq='H', periods=125)

In [5]: ds = xr.Dataset({'foo': ('time', np.arange(125)), 'time': time})

In [6]: ds.resample({'time':'6H'}, offset='0H').max()
Out[6]: 
<xarray.Dataset>
Dimensions:  (time: 21)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ... 2000-01-06
Data variables:
    foo      (time) int64 5 11 17 23 29 35 41 47 ... 89 95 101 107 113 119 124

In [7]: xr.__version__
Out[7]: '2023.1.0'

In [8]: pd.__version__
Out[8]: '1.5.3'

I guess the critical point is the different pandas version here.

Note that I don't get that error when using python > 3.8 :

Python 3.11.5 (main, Sep 11 2023, 13:23:44) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: time = pd.date_range('2000-01-01', freq='H', periods=125)

In [5]: ds = xr.Dataset({'foo': ('time', np.arange(125)), 'time': time})

In [6]: ds.resample({'time':'6H'}, offset='0H').max()
Out[6]: 
<xarray.Dataset>
Dimensions:  (time: 21)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ... 2000-01-06
Data variables:
    foo      (time) int64 5 11 17 23 29 35 41 47 ... 89 95 101 107 113 119 124

In [7]: xr.__version__
Out[7]: '2023.6.0'

In [8]: pd.__version__
Out[8]: '2.0.3'

Does this reproduce on the latest xarray version?

I don't know.

Latest version is not available currently through pip for python 3.8 :

[...]:~$pip install xarray==2023.9.0
ERROR: Ignored the following versions that require a different python version: 2023.2.0 Requires-Python >=3.9; 2023.3.0 Requires-Python >=3.9; 2023.4.0 Requires-Python >=3.9; 2023.4.1 Requires-Python >=3.9; 2023.4.2 Requires-Python >=3.9; 2023.5.0 Requires-Python >=3.9; 2023.6.0 Requires-Python >=3.9; 2023.7.0 Requires-Python >=3.9; 2023.8.0 Requires-Python >=3.9; 2023.9.0 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement xarray==2023.9.0 (from versions: 0.7.0, 0.7.1, 0.7.2, 0.8.0rc1, 0.8.0, 0.8.1, 0.8.2, 0.9.0rc1, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.10.0rc1, 0.10.0rc2, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.5, 0.10.6, 0.10.7, 0.10.8, 0.10.9, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.12.0, 0.12.1, 0.12.2, 0.12.3, 0.13.0, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.18.0, 0.18.1, 0.18.2, 0.19.0, 0.20.0, 0.20.1, 0.20.2, 0.21.0, 0.21.1, 2022.3.0, 2022.6.0rc0, 2022.6.0, 2022.9.0, 2022.10.0, 2022.11.0, 2022.12.0, 2023.1.0)
ERROR: No matching distribution found for xarray==2023.9.0

Has xarray dropped python 3.8 support lately?

Has xarray dropped python 3.8 support lately?

Yes, xarray dropped 3.8 earlier this year a bit prematurely but is now in line with NEP 29.

Ok this might explain why.

For example, with python 3.11 it works, with a newer version of xarray :

Python 3.11.5 (main, Sep 11 2023, 13:23:44) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

In [4]: time = pd.date_range('2000-01-01', freq='H', periods=125)

In [5]: ds = xr.Dataset({'foo': ('time', np.arange(125)), 'time': time})

In [6]: ds.resample({'time':'6H'}, offset='0H').max()
Out[6]: 
<xarray.Dataset>
Dimensions:  (time: 21)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ... 2000-01-06
Data variables:
    foo      (time) int64 5 11 17 23 29 35 41 47 ... 89 95 101 107 113 119 124

In [7]: xr.__version__
Out[7]: '2023.6.0'

I guess I will have to drop support for python 3.8 myself...