pmlmodelling/nctoolkit

IndexError: list index out of range in open_data()

break2make opened this issue · 37 comments

python nc_toolkit.py 
Please install CDO version 1.9.7 or above: https://code.mpimet.mpg.de/projects/cdo/ or https://anaconda.org/conda-forge/cdo
0.7.6
Traceback (most recent call last):
  File "x\gis_experiments\nc_toolkit.py", line 16, in <module>
    main()
  File "x\gis_experiments\nc_toolkit.py", line 7, in main     
    ds = nc.open_data("./data/tasmax_day_EC-Earth3_ssp245_r1i1p1f1_gr_2015.nc")
  File "x\gis_experiments\venv\lib\site-packages\nctoolkit\api.py", line 759, in open_data
    list1 = d.contents.reset_index(drop=True).data_type
  File "x\gis_experiments\venv\lib\site-packages\nctoolkit\api.py", line 1478, in contents
    return self.show_contents()
  File "x\gis_experiments\venv\lib\site-packages\nctoolkit\api.py", line 1315, in show_contents
    if out_inc[i]:
IndexError: list index out of range

I'm using Python 3.10 in Windows 10. Please help to resolve this issue.

Hi @break2make. As stated on the package website, nctoolkit will not work on Windows. Your options would be to either use Linux/macOS or use the Linux subsystem for Windows.

Hello, I have the same error in open_data() while working with Ubuntu (via Virtual Box), nctoolkit 0.8.6, CDO 2.1.1 and Python 3.10.8.
I used a vitual environment with which I first installed CDO last version, then nctoolkit 0.2.2 (via conda install -c conda-forge nctoolkit, I did not really get why such an old version was installed). No problem with open_data(), but I could not read my netcdf files correctly (empty .time(), empty .years() and IndexError: list index out of range for .plot()), so I updated nctoolkit with the most recent version, which gave me this error...

Hi @agnesfrancois

Can you provide the line of code that's giving the problem and the full python error?

The package is tested daily with CDO 2.1.1 and Python 3.10.8: https://app.circleci.com/pipelines/github/pmlmodelling/nctoolkit. And the tests are all passing at the minute. So it's most likely a versioning issue. Potentially conda is installing a very old version of a dependency that's incompatible in some way.

If possible, could you upload a .yml file with your conda environment and I can possibly see if I can reproduce the problem.

Yes, thank you for your really quick answer. My code (with screenshot attached) :
`import matplotlib as plt
import nctoolkit as nc
...
file = nc.open_data('SWIO12_CNRM-ESM2-1_HIST_r1i1p1f2_CNRM-ALADIN63_v2_frac_land_fx_once_REU_grid003.nc')


IndexError Traceback (most recent call last)
Cell In[8], line 1
----> 1 file = nc.open_data('SWIO12_CNRM-ESM2-1_HIST_r1i1p1f2_CNRM-ALADIN63_v2_frac_land_fx_once_REU_grid003.nc')

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/api.py:755, in open_data(x, checks, **kwargs)
752 d._thredds = thredds
754 if (len(d) == 1) and checks and (thredds is False):
--> 755 d_contents = d.contents.reset_index(drop = True)
756 try:
757 d_sub = d_contents.query("fill_value == 0.0").reset_index(drop = True)

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/api.py:1506, in DataSet.contents(self)
1499 @Property
1500 def contents(self):
1501 """
1502 Detailed list of variables contained in a dataset.
1503 This will only display the variables in the first file of an ensemble.
1504 """
-> 1506 return self.show_contents()

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/api.py:1335, in DataSet.show_contents(self, n)
1333 i = 1
1334 while True:
-> 1335 if out_inc[i]:
1336 break
1337 i += 1

IndexError: list index out of range`

image

I don't see how to upload a .yml here (file type not supported) but here is my spec file
spec-file-export.txt

OK. Based on the output, nctoolkit's backend CDO does not like the file that much, @agnesfrancois . Is there anyway to share the file?

For now, you could try:

file = nc.open_data('SWIO12_CNRM-ESM2-1_HIST_r1i1p1f2_CNRM-ALADIN63_v2_frac_land_fx_once_REU_grid003.nc', checks = False)

This won't check the contents of the file when it is opened. The checking is currently throwing the error.

Can you try running this on the command line:

cdo sinfon SWIO12_CNRM-ESM2-1_HIST_r1i1p1f2_CNRM-ALADIN63_v2_frac_land_fx_once_REU_grid003.nc

and add the results on here? That will show if there is a fundamental problem with the data itself or if maybe I need to tweak something in nctoolkit to handle the file.

Thank you for your answer. Unfortunately, I don't think I can share the file. Here are the results from the terminal :
image
I can also see with netCDF4 that the file was made with CDO v1.7.0 with convention CF-1.6 and file format HDF5.
(Indeed, with checks=False, I don't have the error anymore, but functions on the dataset don't work then)

OK. That's strange. Based on that, you shouldn't be getting the error message. It sounds like there is some issue calling CDO (using subprocess) in nctoolkit.

What errors are you getting when you try methods?

I tried with several files I have, and when I try .variables, .years or .times results show an empty list (the other files have time-dependent data), when I try .spatial_mean() nothing is shown, when I try .mean() I have "AttributeError: 'DataSet object has no attribute 'mean'" and when I try to plot my files I have the following error (same IndexError if I try file.contents):
image

There might be something I don't get, I will try several things.

EDIT : it seems that when I convert my files with file.to_dataframe(), everything works well, so I will work with that if I don't manage to work with DataSets...

Can you try the following:

import nctoolkit as nc
ds = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.mean.nc")
ds.subset(time = 0)
ds.plot()

If that doesn't work then there must be package mismatch.

Also, try:
nc.cdo_version()

That should show 2.1.1. If it does not then CDO is not actually visible to Python.

Indeed, there is a problem with both tests.
nc.cdo_version() does not show anything, and the four lines of code give me the following error :

The dataset has been reset to the starting point due to a run failure! Please change commands, where applicable, and re-run.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/runthis.py:1024, in run_this(os_command, self, output, out_file, suppress)
   1022 else:
-> 1024     target = run_cdo(
   1025         ff_command, target, out_file, precision=self._precision
   1026     )
   1027     target_list.append(target)

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/runthis.py:593, in run_cdo(command, target, out_file, overwrite, precision)
    592     remove_safe(target)
--> 593     raise ValueError(f"{command} was not successful. Check output")
    595 session_info["latest_size"] = os.path.getsize(target)

ValueError: cdo -L  -seltimestep,1 https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.mean.nc /var/tmp/nctoolkitzmxhixgrnctoolkittmpnti50ji4.nc was not successful. Check output

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[15], line 4
      2 ds = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.mean.nc")
      3 ds.subset(time = 0)
----> 4 ds.plot()

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/plot.py:61, in plot(self, vars, autoscale, out, coast, **kwargs)
     58     kwargs["title"] = ""
     60 # run any commands
---> 61 self.run()
     63 if session_info["coast"]:
     64     coast = True

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/run.py:38, in run(self)
     35 if self._merged:
     36     output_method = "one"
---> 38 run_this(cdo_command, self, output=output_method)
     40 self._merged = False
     42 self._execute = False

File ~/anaconda3/envs/cdo/lib/python3.10/site-packages/nctoolkit/runthis.py:1198, in run_this(os_command, self, output, out_file, suppress)
   1196 except Exception as e:
   1197     self.reset()
-> 1198     raise ValueError(e)

ValueError: cdo -L  -seltimestep,1 https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.mean.nc /var/tmp/nctoolkitzmxhixgrnctoolkittmpnti50ji4.nc was not successful. Check output

OK. It sounds like something has gone wrong in your conda environment and CDO isn't accesssible from Python.

Try this:

import subprocess
subprocess.Popen("cdo --version", shell = True)

You should get something like the output below. If you don't then something has gone wrong in your conda environment.

Climate Data Operators version 2.1.1 (https://mpimet.mpg.de/cdo)
System: x86_64-conda-linux-gnu
CXX Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1671208954590/_build_env/bin/x86_64-conda-linux-gnu-c++ -fPIC -DPIC -g -O2 -fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /local1/data/scratch/rwi/mambaforge3/envs/nc/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/cdo_1671208954590/work=/usr/local/src/conda/cdo-2.1.1 -fdebug-prefix-map=/local1/data/scratch/rwi/mambaforge3/envs/nc=/usr/local/src/conda-prefix -fopenmp -pthread
CXX version : unknown
C Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1671208954590/_build_env/bin/x86_64-conda-linux-gnu-cc -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /local1/data/scratch/rwi/mambaforge3/envs/nc/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/cdo_1671208954590/work=/usr/local/src/conda/cdo-2.1.1 -fdebug-prefix-map=/local1/data/scratch/rwi/mambaforge3/envs/nc=/usr/local/src/conda-prefix -fopenmp -pthread -pthread
C version : unknown
F77 Compiler: /home/conda/feedstock_root/build_artifacts/cdo_1671208954590/_build_env/bin/x86_64-conda-linux-gnu-gfortran -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /local1/data/scratch/rwi/mambaforge3/envs/nc/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/cdo_1671208954590/work=/usr/local/src/conda/cdo-2.1.1 -fdebug-prefix-map=/local1/data/scratch/rwi/mambaforge3/envs/nc=/usr/local/src/conda-prefix
F77 version : GNU Fortran (conda-forge gcc 11.3.0-19) 11.3.0
Features: 31GB 12threads c++17 OpenMP45 Fortran pthreads HDF5 NC4/HDF5/threadsafe OPeNDAP udunits2 proj xml2 magics curl fftw3 sse3
Libraries: yac/2.6.1 HDF5/1.12.2 proj/9.1.1 xml2/2.10.3 curl/7.86.0 magics/4.12.1
CDI data types: SizeType=size_t
CDI file types: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5 nczarr
CDI library version : 2.1.1
cgribex library version : 2.0.2
ecCodes library version : 2.27.0
NetCDF library version : 4.8.1 of Oct 31 2022 22:17:45 $
HDF5 library version : 1.12.2 threadsafe
exse library version : 1.4.2
FILE library version : 1.9.1

OK thank you, I have the following output so something is wrong with my environment, I will try again and create a new one

image

That suggests the jupyter notebook you are using is not actually from the CDO environment.

Check by running this in the notebook

conda list

You should see CDO in the packages. Otherwise, the notebook package is probably outside the CDO environment.

Indeed, CDO is in the packages...

image

Very strange. Try this in the notebook:

! cdo --version

It's possible the Python version being used by the notebook does not actually come from the environment. Did you specify Python version when creating the environment?

I have the following output (in french sorry):
image

For the environment I was using until now, yes I had to specify the version. As I wanted to upgrade nctoolkit v0.2.2, I had to specify Python v3.10 as I first had Python v3.11. However, I just tried with a new environment, with conda install -c conda-forge nctoolkit as the only command (it was not working before), and I have the exact same outputs

OK that's strange. CDO is in the environment, but is not accessible from the notebook. Running ! cdo--version from a notebook should be the equivalent of running it from the terminal in the environment, as far as I understand it. Something strange must be going on with the environment.

Hello ! Problem solved after checking this page : https://code.mpimet.mpg.de/boards/1/topics/13131 , jupyterlab was not installed in my virtual environment, so the notebook could not really make a link

Thanks for confirming the problem @agnesfrancois. Always tricky to ensure everything you are using is in the environment

I have a similar trouble on Ubuntu and wonder if a solution is available?

import nctoolkit as nc
Traceback (most recent call last):
File "", line 1, in
File "/home/sarr/anaconda3/envs/CDO_environment/lib/python3.11/site-packages/nctoolkit/init.py", line 54, in
if valid(cdo_version) is False:
^^^^^^^^^^^^^^^^^^
File "/home/sarr/anaconda3/envs/CDO_environment/lib/python3.11/site-packages/nctoolkit/init.py", line 26, in valid
where = [m.start() for m in re.finditer(sub, string)][n - 1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
IndexError: list index out of range

Hi @fipoucat. Which version of nctoolkit and CDO are installed?

Hi, I am using nctoolkit 0.8.7 and cdo 2.1.1

Can you double check that you are actually using nctoolkit 0.8.7 in your environment @fipoucat ? It looks like it is picking up a much older version. You are getting an error at line 54 in the init.py file. But there isn't a line 54 : https://github.com/pmlmodelling/nctoolkit/blob/master/nctoolkit/__init__.py

This looks like nctoolkit version 0.2.X, which won't work with CDO 2.0.X because they changed how output was returned.

I'll try to uninstall it and reinstall and will update you

@robertjwilson, that's correct in the cdo env version 0.2.2 was installed so uninstalling it solved the issue.
Thank you for the quick support

I was too fast, when testing the example I am getting a strange error:
AttributeError: module 'nctoolkit' has no attribute 'open_data'

Please add the code that caused the issue. This doesn't sound like something that could happen, so it's probably an environment issue.

I am testing the example in the user guide:
import nctoolkit as nc

import numpy as mp

ds = nc.open_data("/home/sarr/work/READING_ASSESS/sst.mon.mean.nc")
Traceback (most recent call last):

Cell In[4], line 1
ds = nc.open_data("/home/sarr/work/READING_ASSESS/sst.mon.mean.nc")

AttributeError: module 'nctoolkit' has no attribute 'open_data'

I don't think there is anything within nctoolkit that could cause that too happen. Something must have gone wrong when installing or setting up environments.

What happens when you try to autocomplete after typing nc. in your notebook/ipython? Does anything show up?

I cleaned all installation and now working after reinstalled.

Thank you

@robertjwilson ,
A follow up problem since my previous comment on my installation problem:
When opening my netcdf file with nc.open.data(output.nc) I have an error message:
The variable(s) z,lsm,cl have integer data type. Consider setting data type to float 'F64' or 'F32' using set_precision.
Any hint for a solution?
Thank you

This is normally not something to worry about. nctoolkit uses CDO which will preserve the netCDF data type when carrying out calculations. So if you have an integer data type, all calculations will use that data type.

However, you can change thedata type by doing

ds.set_precision("F32")

In general, data type doesn't matter too much. There are some cases where you need to be careful. For example, let's say you wanted to calculate the fraction of years when temperature in a dataset was above 10 C. You could do the following:

ds > 10
ds.tmean()

However, if the data type is integer, you will probably just have 0s and 1s in the dataset before tmean, and you can only end up with a 0 or 1 in the output because it is integer.

Hello, i get the same error when trying to import nctoolkit.

`---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[129], line 1
----> 1 import nctoolkit as nc

File ~/miniconda3/lib/python3.10/site-packages/nctoolkit/init.py:54
52 cdo_check = cdo_check.replace("b'", "").strip()
53 cdo_version = cdo_check.split("(")[0].strip().split(" ")[-1]
---> 54 if valid(cdo_version) is False:
55 print(
56 "Please install CDO version 1.9.3 or above: https://code.mpimet.mpg.de/projects/cdo/ or https://anaconda.org/conda-forge/cdo"
57 )

File ~/miniconda3/lib/python3.10/site-packages/nctoolkit/init.py:26, in valid(string)
24 wanted = ""
25 n = 3
---> 26 where = [m.start() for m in re.finditer(sub, string)][n - 1]
28 string = re.sub("[A-Za-z]", "", string)
30 before = string[:where]

IndexError: list index out of range`

I have cdo version 2.0.3

Which version of nctoolkit are you using @denisthenichita? Based on the error message, you have an old version installed. This can happen when you install with conda, which can install version 0.2x instead of a recent version.

Old versions of nctoolkit won't work with CDO 2.0x because they changed how output was returned.

Try updating to nctoolkit 0.9.0. Also update CDO to 2.0.5 or above. nctoolkit won't work with CDO 2.0.3 because a bug in CDO was causing one of the nctoolkit tests to fail. This was fixed in 2.0.5.

Thank you for the very quick response. conda update nctoolkit tells me # All requested packages already installed.. Indeed i have the 0.2 version of nctoolkit. I am sorry if this is a rookie mistake, i am relatively new to python. Working on wsl atm.

That version is almost 2.5 years old.

Just do this to get the latest:

conda install nctoolkit=0.9.0

My recommendation would be to use mambaforge instead of conda. You won't run into these issues with it as much: https://github.com/conda-forge/miniforge