monocongo/climate_indices

Order of dimensions

farheen2022 opened this issue ยท 6 comments

Sir kindly tell, how to change the name and order of dimension of NETCDF file. I am using chirps dataset and my dimension name is latitude, I am unable to change it to lat and that is raising error. I have tried using ncpdq in Conda prompt for correct order of dimensions but that is raising error related to size of the internal memory.

You can rename a netCDF dimension with xarray.rename() function, e.g. xarray.rename({'latitude': 'lat'}

If you need to change the ordering of dimensions, you can use xarray.transpose() e.g. data["prcp"].transpose("lat", "lon", "time")

As @bradleyswilson alludes to above you can leverage xarray for this and then write the resulting xarray.Dataset object to file. Then use that new NetCDF file as input to this package's main processing script.

I am now able to change the order of dimension of the input file and save it. The problem was arising because the file was too big, almost 7GB. I was using CHIRPS rainfall dataset. I checked using CRU rainfall dataset and I am able to change my input file. Thank you @bradleyswilson @monocongo

To do an automatic conversion, I usually add these lines after every update:

In _compute_write_index ( main.py )
after this line:

`dataset = xr.open_mfdataset(list(set(files)), chunks=chunks)

# Add this ################################
if 'latitude' in dataset.coords:
    dataset.rename({'latitude':'lat','longitude':'lon'})
if 'bnds' in dataset.dims:
    dataset = dataset.drop('time_bnds')
keys = list(dataset.keys())
for key in keys:
    if 'time' in dataset.coords:
        dataset[key] = dataset[key].transpose("lat", "lon", "time")
    else:
        dataset[key] = dataset[key].transpose("lat", "lon")
if 'time' in dataset.coords:
    dataset = dataset[['lat', 'lon', 'time', *keys ]]
else:
    dataset = dataset[['lat', 'lon', *keys ]]
########################################

`

And in _prepare_file ( main.py )
After this line

` ds = xr.open_dataset(netcdf_file)

# Add this ################################
if 'latitude' in ds.coords:
    ds.rename({'latitude':'lat','longitude':'lon'})
if 'bnds' in ds.dims:
    ds = ds.drop('time_bnds')
keys = list(ds.keys())
for key in keys:
    if 'time' in ds.coords:
        ds[key] = ds[key].transpose("lat", "lon", "time")
    else:
        ds[key] = ds[key].transpose("lat", "lon")
if 'time' in ds.coords:
    ds = ds[['lat', 'lon', 'time', *keys ]]
else:
    ds = ds[['lat', 'lon', *keys ]]
########################################

`

One more that is unrelated:

I usually have to change pet in indices.py

From:
if (latitude_degrees is not None) and not np.isnan(latitude_degrees) and (-90.0 < latitude_degrees < 90.0):

To:
if (latitude_degrees is not None) and not np.isnan(latitude_degrees) and (-90.0 <= latitude_degrees <= 90.0):

Thanks for helping @maxxpower007 ! The common fixes you outlined above might be useful for all users -- maybe we should roll these into the main processing script? One limitation, for now, is that there are no proper tests for the main processing script, so harder to be sure we've not broken something if we add code willy-nilly.