Resampling functions not working for calendars allow overlapping
geek-yang opened this issue · 0 comments
Currently if we create a calendar with overlapping intervals and try to resample data with this calendar, for instance error:
import s2spy.time
# define calendar
calendar = s2spy.time.AdventCalendar(anchor=(11, 30), freq='90d', n_targets=1)
calendar.set_max_lag(5, allow_overlap=True)
# example timeseries
time_index = pd.date_range('20171020', '20211001', freq='15d')
random_data = np.random.random(len(time_index))
example_series = pd.Series(random_data, index=time_index)
# map calendar to data
calendar = calendar.map_to_data(example_series)
# resample data
resampled_series = s2spy.time.resample(calendar, example_series)
it will trigger the following error:
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
/home/yangliu/AI4S2S/s2spy/notebooks/tutorial_time.ipynb Cell 20 in <cell line: 1>()
----> [1](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/yangliu/AI4S2S/s2spy/notebooks/tutorial_time.ipynb#X31sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) resampled_series = s2spy.time.resample(calendar, example_series)
[2](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/yangliu/AI4S2S/s2spy/notebooks/tutorial_time.ipynb#X31sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1) resampled_series
File ~/AI4S2S/s2spy/s2spy/_resample.py:229, in resample(mapped_calendar, input_data)
225 #utils.check_timeseries(input_data)
226 #utils.check_input_frequency(mapped_calendar, input_data)
228 if isinstance(input_data, PandasData):
--> 229 resampled_data = resample_pandas(mapped_calendar, input_data)
230 else:
231 resampled_data = resample_xarray(mapped_calendar, input_data)
File ~/AI4S2S/s2spy/s2spy/_resample.py:96, in resample_pandas(calendar, input_data)
93 bins = resample_bins_constructor(calendar.get_intervals())
95 interval_index = pd.IntervalIndex(bins["interval"])
---> 96 interval_groups = interval_index.get_indexer(input_data.index)
97 # except Exception as error:
98 # print(error)
99 # finally:
100 # interval_groups, _ = interval_index.get_indexer_non_unique(input_data.index)
101 interval_means = input_data.groupby(interval_groups).mean()
File ~/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py:3721, in Index.get_indexer(self, target, method, limit, tolerance)
...
-> 3721 raise InvalidIndexError(self._requires_unique_msg)
3723 if len(target) == 0:
3724 return np.array([], dtype=np.intp)
InvalidIndexError: cannot handle overlapping indices; use IntervalIndex.get_indexer_non_unique
I notice that we use pd.IntervalIndex.get_indexer()
to index the example data based on the intervals and then call groupby().mean()
function to perform the resampling. However, pd.IntervalIndex.get_indexer()
can not handle overlapped intervals since one data point in the example data can not have two labels.
This error message suggests to use the method IntervalIndex.get_indexer_non_unique
but it also does not work. And the index array generated by IntervalIndex.get_indexer_non_unique
is very weird. The documentation is poorly written. I would like to further investigate this. It seems to be a chance to contribute to pandas. And in the meantime I will address this issue.