What's the meaning of "ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions"
wiz21b opened this issue · 14 comments
Hello,
I try to use your package. I pass it a panda series of events. Those events are storms. I provide a time series of storm durations (in hours) and storm begin time (index of the series). There can be several storms at the same time and there are times without storms. Then I run:
model = EVA(pd.Series(durations, np.sort(dates))) # Dates are not important, durations are.
model.get_extremes(method="BM", block_size="10D")
model.plot_extremes()
which ends up with:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (731,) + inhomogeneous part.
I believe you expect data at regular intervals (and so I must fill in some gaps). Correct ? I'll continue fiddling a bit. This issue is more about giving an error message one (novice) can understand.
More information. I understand the error comes out of pandas, not your code directly. Just for information, my data looks like:
model = EVA(pd.Series(durations, np.sort(dates)))
print(dates, dates.dtype)
print(durations, durations.dtype)
['2002-01-01T16:00:00.000000000' '2002-01-01T20:00:00.000000000'
'2002-01-02T03:00:00.000000000' ... '2004-10-02T23:00:00.000000000'
'2004-10-03T07:00:00.000000000' '2004-10-03T13:00:00.000000000'] datetime64[ns]
[20. 18. 7. ... 4. 5. 7.] float64
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [115], in <module>
----> 1 model.get_extremes(method="BM", block_size="10D")
2 model.plot_extremes()
File ~/.local/lib/python3.9/site-packages/pyextremes/eva.py:452, in EVA.get_extremes(self, method, extremes_type, **kwargs)
450 message = f"for method='{method}' and extremes_type='{extremes_type}'"
451 logger.debug("extracting extreme values %s", message)
--> 452 self.__extremes = get_extremes(
453 method=method,
454 ts=self.data,
455 extremes_type=extremes_type,
456 **kwargs,
457 )
458 self.__extremes_method = method
459 self.__extremes_type = extremes_type
File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/extremes.py:59, in get_extremes(ts, method, extremes_type, **kwargs)
13 """
14 Get extreme events from time series.
15
(...)
56
57 """
58 if method == "BM":
---> 59 return get_extremes_block_maxima(
60 ts=ts,
61 extremes_type=extremes_type,
62 **kwargs,
63 )
64 if method == "POT":
65 return get_extremes_peaks_over_threshold(
66 ts=ts,
67 extremes_type=extremes_type,
68 **kwargs,
69 )
File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/block_maxima.py:148, in get_extremes_block_maxima(ts, extremes_type, block_size, errors, min_last_block)
137 warnings.warn(
138 message=f"{empty_intervals} blocks contained no data",
139 category=NoDataBlockWarning,
140 )
142 logger.debug(
143 "successfully collected %d extreme events, found %s no-data blocks",
144 len(extreme_values),
145 empty_intervals,
146 )
--> 148 return pd.Series(
149 data=extreme_values,
150 index=pd.Index(data=extreme_indices, name=ts.index.name or "date-time"),
151 dtype=np.float64,
152 name=ts.name or "extreme values",
153 ).fillna(np.nanmean(extreme_values))
File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:439, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
437 data = data.copy()
438 else:
--> 439 data = sanitize_array(data, index, dtype, copy)
441 manager = get_option("mode.data_manager")
442 if manager == "block":
File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:570, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
567 data = list(data)
569 if dtype is not None or len(data) == 0:
--> 570 subarr = _try_cast(data, dtype, copy, raise_cast_failure)
571 else:
572 subarr = maybe_convert_platform(data)
File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:760, in _try_cast(arr, dtype, copy, raise_cast_failure)
755 subarr = maybe_cast_to_integer_array(arr, dtype)
756 else:
757 # 4 tests fail if we move this to a try/except/else; see
758 # test_constructor_compound_dtypes, test_constructor_cast_failure
759 # test_constructor_dict_cast2, test_loc_setitem_dtype
--> 760 subarr = np.array(arr, dtype=dtype, copy=copy)
762 except (ValueError, TypeError):
763 if raise_cast_failure:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (731,) + inhomogeneous part.
After cleaning my data, that is, making sure all dates are represented and "non existing" data is set to zero, the program runs. But this makes me nervous. Indeed, filling the gaps with zeros is like saying zeros is a data actually when it's not...
@wiz21b can you share your data so that I can reproduce your error? This is not meant to happen because EVA pre-processes the data during initialization - this may be a scenario I didn't account for.
Also data doesn't have to be at regular intervals.
what was the solution? I'm having the same issue
@coastalmodeler I have never heard back from @wiz21b so I don't know if the issue is resolved. I can reopen this issue for you if you post details about your error.
Cool thanks. I'm getting the same error as wiz21b when I run the get_extremes command. I also get the error when I try to execute some of the plot functions and POT functions. I haven't been able to figure out why.
I'm following exact tutorial case but using data from a different noaa station using the NOAA_COOPS function to download the data into a dataframe. See code below:
tide_gauge=noaa_coops.Station(8775237)
#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
begin_date="20040406",
end_date="20220925",
product="water_level",
datum="NAVD",
units="english",
time_zone="LST")
I then normalize the dataset by adjusting for RSLR:
measured_rslr=5.54*0.00328084 #ft/yr
df_water_levles_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()
df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr
am=pyextremes.EVA(df_water_levels_corrected)
Everything works up until this point and here is the command that results in the error:
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
C:\Users: NoDataBlockWarning: 1 blocks contained no data
warnings.warn(
Traceback (most recent call last):
Input In [94] in <cell line: 1>
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
File ~.conda\envs\work\lib\site-packages\pyextremes\eva.py:452 in get_extremes
self.__extremes = get_extremes(
File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\extremes.py:59 in get_extremes
return get_extremes_block_maxima(
File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\block_maxima.py:148 in get_extremes_block_maxima
return pd.Series(
File ~.conda\envs\work\lib\site-packages\pandas\core\series.py:451 in init
data = sanitize_array(data, index, dtype, copy)
File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:594 in sanitize_array
subarr = _try_cast(data, dtype, copy, raise_cast_failure)
File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:784 in _try_cast
subarr = np.array(arr, dtype=dtype, copy=copy)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (18,) + inhomogeneous part.
@coastalmodeler thank you for your input. In order to help you I'll need to be able to reproduce your error. Please provide the following:
- OS
- Python version
- pyextremes version
- numpy version
- pandas version
In addition to that I'll need a complete code snippet which can be run as is. For example:
import pandas as pd
import pyextremes
data = pd.read_csv("data.csv")
model = pyextremes.EVA(data)
model.get_extremes()
And provide a link to your data.csv
. You can also make a GitHub gist with jupyter notebook if that's what you prefer.
Thank you for your quick response. See info and code below.
- OS: Microsoft Windows 10
- Python Version: 3.8.13
- pyextremes version: 2.2.4
- numpy version: 1.21.5
- pandas version: 1.4.3
- noaa_coops version: 0.1.9
Here's the code. Note, there is no CSV file I'm using noaa-coops to download the data directly into python from the API. The noaa coops wrapper can be found here: https://pypi.org/project/noaa-coops/
import noaa_coops as nc
import pyextremes
import numpy as np
import pandas as pd
tide_gauge=nc.Station(8775237)
#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
begin_date="20040406",
end_date="20220925",
product="water_level",
datum="NAVD",
units="english",
time_zone="LST")
measured_rslr=5.54*0.00328084
df_water_levels_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()
df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr
am=pyextremes.EVA(df_water_levels_corrected)
am.get_extremes(method="BM",errors="ignore")
I think I found the issue. There were duplicate time steps in the NOAA dataset. Once I removed those the code works as intended. I'd guess that was the same problem @wiz21b was having. Thank you for your time.
@coastalmodeler thank you for posting your solution here, it was an issue with the EVA
class not removing duplicates - I have included a fix in the latest release
Hi, just to add some information for anyone who might have a similar problem. I was getting the same error, even though I had hourly data. Upon a closer inspection, I noticed that the time step between the data (dt) was not exactly 1 hour, but the difference dt.max()-dt.min()=np.float64(1.1641532182693481e-10). By creating a new perfectly spaced index the problem was solved.
@MBendoni did you have this issue with the latest version? This is not supposed to happen, your data can be spaced at any intervals - pyextremes should be able to handle that. Can you please provide instructions to reproduce this error?
@georgebv the pyextreme version is the 2.3.2. To reproduce the error I think I should send you the netcdf file where the data are stored. In any case, here you have a snippet of code which gives to the error:
import netCDF4 as nc
import pandas as pd
import numpy as np
import pyextremes as pxtr
file_1 = 'myfile.nc'
ds_1 = nc.Dataset(file_1)
ref_date = np.datetime64('1990-01-01 00:00:00')
ny = 30
nd = ny*365*24
t_1 = ref_date + ds_1.variables['time'][0:nd].data.astype('timedelta64[D]')
var_name = 'wl'
v_1 = ds_1.variables[var_name][0:nd].data.flatten()
v_1[v_1<0]=0
s_1 = pd.Series(data=v_1, index=t_1)
ex_1 = pxtr.get_extremes(s_1, "POT", threshold=0.5, r="12h")