What's the meaning of "ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions"

Question

What's the meaning of "ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions"

wiz21b opened this issue 3 years ago · 14 comments

Hello,

I try to use your package. I pass it a panda series of events. Those events are storms. I provide a time series of storm durations (in hours) and storm begin time (index of the series). There can be several storms at the same time and there are times without storms. Then I run:

model = EVA(pd.Series(durations, np.sort(dates))) # Dates are not important, durations are.
model.get_extremes(method="BM", block_size="10D")
model.plot_extremes()

which ends up with:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (731,) + inhomogeneous part.

I believe you expect data at regular intervals (and so I must fill in some gaps). Correct ? I'll continue fiddling a bit. This issue is more about giving an error message one (novice) can understand.

Answer 1 · 2022-03-21T11:53:50.000Z

More information. I understand the error comes out of pandas, not your code directly. Just for information, my data looks like:

model = EVA(pd.Series(durations, np.sort(dates)))
print(dates, dates.dtype)
print(durations, durations.dtype)

['2002-01-01T16:00:00.000000000' '2002-01-01T20:00:00.000000000'
 '2002-01-02T03:00:00.000000000' ... '2004-10-02T23:00:00.000000000'
 '2004-10-03T07:00:00.000000000' '2004-10-03T13:00:00.000000000'] datetime64[ns]
[20. 18.  7. ...  4.  5.  7.] float64

Full stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [115], in <module>
----> 1 model.get_extremes(method="BM", block_size="10D")
      2 model.plot_extremes()

File ~/.local/lib/python3.9/site-packages/pyextremes/eva.py:452, in EVA.get_extremes(self, method, extremes_type, **kwargs)
    450 message = f"for method='{method}' and extremes_type='{extremes_type}'"
    451 logger.debug("extracting extreme values %s", message)
--> 452 self.__extremes = get_extremes(
    453     method=method,
    454     ts=self.data,
    455     extremes_type=extremes_type,
    456     **kwargs,
    457 )
    458 self.__extremes_method = method
    459 self.__extremes_type = extremes_type

File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/extremes.py:59, in get_extremes(ts, method, extremes_type, **kwargs)
     13 """
     14 Get extreme events from time series.
     15 
   (...)
     56 
     57 """
     58 if method == "BM":
---> 59     return get_extremes_block_maxima(
     60         ts=ts,
     61         extremes_type=extremes_type,
     62         **kwargs,
     63     )
     64 if method == "POT":
     65     return get_extremes_peaks_over_threshold(
     66         ts=ts,
     67         extremes_type=extremes_type,
     68         **kwargs,
     69     )

File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/block_maxima.py:148, in get_extremes_block_maxima(ts, extremes_type, block_size, errors, min_last_block)
    137     warnings.warn(
    138         message=f"{empty_intervals} blocks contained no data",
    139         category=NoDataBlockWarning,
    140     )
    142 logger.debug(
    143     "successfully collected %d extreme events, found %s no-data blocks",
    144     len(extreme_values),
    145     empty_intervals,
    146 )
--> 148 return pd.Series(
    149     data=extreme_values,
    150     index=pd.Index(data=extreme_indices, name=ts.index.name or "date-time"),
    151     dtype=np.float64,
    152     name=ts.name or "extreme values",
    153 ).fillna(np.nanmean(extreme_values))

File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:439, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    437         data = data.copy()
    438 else:
--> 439     data = sanitize_array(data, index, dtype, copy)
    441     manager = get_option("mode.data_manager")
    442     if manager == "block":

File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:570, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
    567     data = list(data)
    569 if dtype is not None or len(data) == 0:
--> 570     subarr = _try_cast(data, dtype, copy, raise_cast_failure)
    571 else:
    572     subarr = maybe_convert_platform(data)

File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:760, in _try_cast(arr, dtype, copy, raise_cast_failure)
    755         subarr = maybe_cast_to_integer_array(arr, dtype)
    756     else:
    757         # 4 tests fail if we move this to a try/except/else; see
    758         #  test_constructor_compound_dtypes, test_constructor_cast_failure
    759         #  test_constructor_dict_cast2, test_loc_setitem_dtype
--> 760         subarr = np.array(arr, dtype=dtype, copy=copy)
    762 except (ValueError, TypeError):
    763     if raise_cast_failure:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (731,) + inhomogeneous part.

Answer 2 · 2022-03-21T15:10:51.000Z

After cleaning my data, that is, making sure all dates are represented and "non existing" data is set to zero, the program runs. But this makes me nervous. Indeed, filling the gaps with zeros is like saying zeros is a data actually when it's not...

Answer 3 · 2022-03-22T20:05:06.000Z

@wiz21b can you share your data so that I can reproduce your error? This is not meant to happen because EVA pre-processes the data during initialization - this may be a scenario I didn't account for.

Also data doesn't have to be at regular intervals.

Answer 4 · 2022-09-27T16:25:29.000Z

what was the solution? I'm having the same issue

Answer 5 · 2022-09-27T17:28:44.000Z

@coastalmodeler I have never heard back from @wiz21b so I don't know if the issue is resolved. I can reopen this issue for you if you post details about your error.

Answer 6 · 2022-09-27T18:54:38.000Z

Cool thanks. I'm getting the same error as wiz21b when I run the get_extremes command. I also get the error when I try to execute some of the plot functions and POT functions. I haven't been able to figure out why.

I'm following exact tutorial case but using data from a different noaa station using the NOAA_COOPS function to download the data into a dataframe. See code below:

tide_gauge=noaa_coops.Station(8775237)

#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
begin_date="20040406",
end_date="20220925",
product="water_level",
datum="NAVD",
units="english",
time_zone="LST")

I then normalize the dataset by adjusting for RSLR:
measured_rslr=5.54*0.00328084 #ft/yr

df_water_levles_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()

df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr

am=pyextremes.EVA(df_water_levels_corrected)

Everything works up until this point and here is the command that results in the error:

am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")

am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
C:\Users: NoDataBlockWarning: 1 blocks contained no data
warnings.warn(
Traceback (most recent call last):

Input In [94] in <cell line: 1>
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")

File ~.conda\envs\work\lib\site-packages\pyextremes\eva.py:452 in get_extremes
self.__extremes = get_extremes(

File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\extremes.py:59 in get_extremes
return get_extremes_block_maxima(

File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\block_maxima.py:148 in get_extremes_block_maxima
return pd.Series(

File ~.conda\envs\work\lib\site-packages\pandas\core\series.py:451 in init
data = sanitize_array(data, index, dtype, copy)

File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:594 in sanitize_array
subarr = _try_cast(data, dtype, copy, raise_cast_failure)

File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:784 in _try_cast
subarr = np.array(arr, dtype=dtype, copy=copy)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (18,) + inhomogeneous part.

Answer 7 · 2022-09-28T00:49:08.000Z

@coastalmodeler thank you for your input. In order to help you I'll need to be able to reproduce your error. Please provide the following:

OS
Python version
pyextremes version
numpy version
pandas version

In addition to that I'll need a complete code snippet which can be run as is. For example:

import pandas as pd
import pyextremes

data = pd.read_csv("data.csv")
model = pyextremes.EVA(data)
model.get_extremes()

And provide a link to your data.csv. You can also make a GitHub gist with jupyter notebook if that's what you prefer.

Answer 8 · 2022-09-28T13:46:25.000Z

Thank you for your quick response. See info and code below.

OS: Microsoft Windows 10
Python Version: 3.8.13
pyextremes version: 2.2.4
numpy version: 1.21.5
pandas version: 1.4.3
noaa_coops version: 0.1.9

Here's the code. Note, there is no CSV file I'm using noaa-coops to download the data directly into python from the API. The noaa coops wrapper can be found here: https://pypi.org/project/noaa-coops/

import noaa_coops as nc
import pyextremes
import numpy as np
import pandas as pd

tide_gauge=nc.Station(8775237)

#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
    begin_date="20040406",
    end_date="20220925",
    product="water_level",
    datum="NAVD",
    units="english",
    time_zone="LST")

measured_rslr=5.54*0.00328084 

df_water_levels_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()

df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr

am=pyextremes.EVA(df_water_levels_corrected)
am.get_extremes(method="BM",errors="ignore")

Answer 9 · 2022-09-28T16:53:46.000Z

I think I found the issue. There were duplicate time steps in the NOAA dataset. Once I removed those the code works as intended. I'd guess that was the same problem @wiz21b was having. Thank you for your time.

Answer 10 · 2022-09-29T12:40:04.000Z

@coastalmodeler thank you for posting your solution here, it was an issue with the EVA class not removing duplicates - I have included a fix in the latest release

Answer 11 · 2024-07-19T15:16:23.000Z

Hi, just to add some information for anyone who might have a similar problem. I was getting the same error, even though I had hourly data. Upon a closer inspection, I noticed that the time step between the data (dt) was not exactly 1 hour, but the difference dt.max()-dt.min()=np.float64(1.1641532182693481e-10). By creating a new perfectly spaced index the problem was solved.

Answer 12 · 2024-07-20T20:03:25.000Z

@MBendoni did you have this issue with the latest version? This is not supposed to happen, your data can be spaced at any intervals - pyextremes should be able to handle that. Can you please provide instructions to reproduce this error?

Answer 13 · 2024-07-23T08:43:59.000Z

@georgebv the pyextreme version is the 2.3.2. To reproduce the error I think I should send you the netcdf file where the data are stored. In any case, here you have a snippet of code which gives to the error:

import netCDF4 as nc
import pandas as pd
import numpy as np
import pyextremes as pxtr

file_1 = 'myfile.nc'
ds_1 = nc.Dataset(file_1)
ref_date = np.datetime64('1990-01-01 00:00:00')
ny = 30
nd = ny*365*24
t_1 = ref_date + ds_1.variables['time'][0:nd].data.astype('timedelta64[D]')
var_name = 'wl'
v_1 = ds_1.variables[var_name][0:nd].data.flatten()
v_1[v_1<0]=0
s_1 = pd.Series(data=v_1, index=t_1)
ex_1 = pxtr.get_extremes(s_1, "POT", threshold=0.5, r="12h")

Answer 14 · 2024-07-23T19:12:28.000Z

@MBendoni can you please write an example that doesn't rely on external data? You can generate random data inside the snippet.

https://numpy.org/doc/stable/reference/random/index.html