monocongo/climate_indices

IndexError encountered when computing SPI

Opened this issue · 6 comments

Describe the bug
I encountered an "IndexError: index 159 is out of bounds for axis 1 with size 159" when attempting to use the SPI computation function from the climate-indices Python package.

To Reproduce
Steps to reproduce the behavior:

  1. Installed the climate-indices package in a conda environment.
    
  2. Ran the following command:
    

spi --periodicity monthly --scales 1 2 3 6 9 12 24 36 48 --calibration_start_year 1981 --calibration_end_year 2023 --netcdf_precip /path/to/my/netcdf/precip_data.nc --var_name_precip precip --output_file_base /path/to/my/output/CHIRPS --multiprocessing all --save_params /path/to/my/output/CHIRPS_fitting.nc --overwrite

  1. Encountered the following error:
    IndexError: index 159 is out of bounds for axis 1 with size 159

The full traceback is as follows:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/climate_indices/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/user/miniconda3/envs/climate_indices/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/user/miniconda3/envs/climate_indices/lib/python3.7/site-packages/climate_indices/__spi__.py", line 1212, in _apply_to_subarray_gamma
    periodicity=args["periodicity"],
IndexError: index 159 is out of bounds for axis 1 with size 159
"""
  File "climate_indices/__spi__.py", line 1502, in main
    _compute_write_index(kwrgs)
  File "climate_indices/__spi__.py", line 700, in _compute_write_index
    args=args,
  File "climate_indices/__spi__.py", line 1007, in _parallel_fitting
    pool.map(_apply_to_subarray_gamma, chunk_params)
  File "multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "multiprocessing/pool.py", line 657, in get
    raise self._value
IndexError: index 159 is out of bounds for axis 1 with size 159

Expected behavior
I expected the climate-indices package to compute the SPI without any issues.

Desktop (please complete the following information):

  • OS: Ubuntu 22.04.1 LTS
  • Version of climate-indices used: latest version from pip

Additional Context
I was following this guideline on computing SPI using CHIRPS data.

Hi @jessefriend thanks for this error report.

If you can please re-install the climate_indices package from this development branch to see if that fixes your issue: https://github.com/monocongo/climate_indices/tree/issue_522_pyproject_poetry

Also if you can please post a link to the dataset used for /path/to/my/netcdf/precip_data.nc in the command listed above then hopefully I can use that to successfully reproduce the error.

Hi @monocongo, thanks for the fast feedback.

I tried running it again with the development branch and still ran into the same issue.

Here is a link to the dataset on WeTransfer:
https://we.tl/t-3LBTVdu77D

Here is a description of it:

<class 'netCDF4._netCDF4.Variable'>
float32 time(time)
units: days since 1981-1-1 00:00:00
standard_name: time
calendar: gregorian
axis: T
unlimited dimensions: time
current shape = (509,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
units: degrees_east
standard_name: longitude
long_name: longitude
axis: X
unlimited dimensions:
current shape = (159,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
units: degrees_north
standard_name: latitude
long_name: latitude
axis: Y
unlimited dimensions:
current shape = (186,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
int32 crs()
long_name: Lon/Lat Coords in WGS84
grid_mapping_name: latitude_longitude
longitude_of_prime_meridian: 0.0
semi_major_axis: 6378137.0
inverse_flattening: 298.257223563
unlimited dimensions:
current shape = ()
filling on, default _FillValue of -2147483647 used
<class 'netCDF4._netCDF4.Variable'>
float32 precip(time, lat, lon)
_FillValue: -9999.0
units: mm
standard_name: convective precipitation rate
long_name: Climate Hazards group InfraRed Precipitation with Stations
time_step: dekad
missing_value: -9999.0
geospatial_lat_min: -4.675
geospatial_lat_max: 4.62
geospatial_lon_min: 33.89
geospatial_lon_max: 41.85
grid_mapping: crs
unlimited dimensions: time
current shape = (509, 186, 159)
filling on

Hello @monocongo ,

i have looked into the same issue (I am colleague with @jessefriend ) and i realized the climatology_dataset (CHIRP) expects

   expected_dims_3d_climate = {"lat", "lon", "time"}

here

expected_dims_3d_climate = {"lat", "lon", "time"}

next, the code

if len(dims) == 3:
if dims != expected_dims_3d_climate:
message = f"Invalid dimensions for variable '{var_name}': {dims}"
_logger.error(message)
raise ValueError(message)

did not catch this difference

a =  {'lat', 'lon', 'time'}
b =  {'time', 'lon', 'lat'}
a ==  b
True

while something like these would

for x, y in zip(a, b):
    assert x == y

hope this helps

I can confirm this issue is gone when I changed the data dimensions within the netCDF to {'lat', 'lon', 'time'}

@iferencik You have found a sleeper bug, thank you! We're comparing the two as sets, which have no order, but the order is important, as in this case. I will leave this issue open for now as a reminder to fix this.

@jessefriend Thanks for your fast follow-up to confirm that this is fixed for you now. Getting the data cleaned and ready for processing is tricky, and I had lots of issues trying to handle that for users by including some wrangling in the processing scripts, but this proved to be problematic since it used NCO and that package is not well-supported on Windows.

I am running into the same problem when following the tutorial also followed by jessefriend. @monocongo is the bug supposed to be solved in the current version (2.0.0)? Thank you.

IndexError: index 1500 is out of bounds for axis 1 with size 1500

This is the output of ncdump -h of my input .nc file:

netcdf africa_chirps_1months_1981_2020 {
dimensions:
lon = 1500 ;
lat = 1600 ;
time = UNLIMITED ; // (516 currently)
variables:
float time(time) ;
time:units = "days since 1980-1-1 00:00:00" ;
time:standard_name = "time" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
float lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:axis = "X" ;
float lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:axis = "Y" ;
int crs ;
crs:long_name = "Lon/Lat Coords in WGS84" ;
crs:grid_mapping_name = "latitude_longitude" ;
crs:longitude_of_prime_meridian = 0. ;
crs:semi_major_axis = 6378137. ;
crs:inverse_flattening = 298.257223563 ;
float precip(time, lat, lon) ;
precip:_FillValue = -9999.f ;
precip:units = "mm" ;
precip:standard_name = "convective precipitation rate" ;
precip:long_name = "Climate Hazards group InfraRed Precipitation with Stations" ;
precip:time_step = "dekad" ;
precip:missing_value = -9999.f ;
precip:grid_mapping = "crs" ;