monocongo/climate_indices

PET calculation error using CHIRTS data

Closed this issue · 6 comments

How to craft a useful, minimal bug report

Describe the bug
ERROR Failed to complete right after entering the script.

To Reproduce
Steps to reproduce the behavior:

  1. I used monthly temperature data 1983-2016 from CHIRTS. The data can be downloaded via this link: https://data.chc.ucsb.edu/products/CHIRTSmonthly/netcdf/CHIRTSmax.monthly.nc
  2. Create subset using NCO
    • Crop the bounding box: ncks -d latitude,-23.5,50. -d longitude,60.,180. input.nc -O output.nc
  3. Result from step 2 (I used as input in the computation) available from this link
  4. Using script to compute the PET: process_climate_indices --index pet --periodicity monthly --netcdf_temp rbb_cli_chirts_monthly_tmax_1983_2016_d1.nc --var_name_temp Tmax --output_file_base /Output/rbb_cli_chirts_monthly_pet_1983_2016_d1 --multiprocessing all_but_one

Expected behavior
PET computation completed.

Screenshots
Here's the screenshot image.
Screen Shot 2020-11-27 at 2 51 07 PM

Desktop (please complete the following information):

  • macOS Big Sur

Additional context
I use climate-indices package inside dedicated environment specific for Indices calculation, installed using Anaconda 4.9.2
This is my first try to calculate PET.

I frequently used climate-indices to calculate SPI without any problem. I also wrote the step-by-step in this link

Thanks for reporting this, @bennyistanto

The issue here looks to be happening when the temperature file is being opened using xarray, it seems that it doesn't like the time units. I have run into this before -- it's because months are not standard for time coordinate variable units. I have worked around this by converting the time coordinate values from months to seconds or minutes (or any other non-month unit) since the reference date. Maybe give that a try? Probably the conversion is straight-forward using NCO or CDO. It looks like this guy has a similar issue and NCO can be used to fix it. Hopefully, something similar may be applicable here in your case.

BTW very nice work on the step-by-step guide for SPI!

BTW have you seen this function? It's an example of how you can compute some time coordinate values when using "days since <some_reference_date>" as the time coordinate variable's unit. You may try using that if you are OK with using days as the unit (it will still be monthly data, only the unit is expressed in days).

Thank for the advise. So I try:

  1. Change the time unit following NCO script in stackoverflow and run the PET script, but experience other error about dimension name
    ValueError: Invalid dimensions of the temperature variable: ('time', 'latitude', 'longitude') (valid dimension names and order: [[('lat', 'lon', 'time'), ('time', 'lat', 'lon')], [('time', 'division'), ('division', 'time')]]

  2. I try to rename both dimension and variable from latitude to lat and longitude to lon using NCO:

  • ncrename -d longitude,lon file.nc
  • ncrename -d latitude,lat file.nc
  • ncrename -v longitude,lon file.nc
  • ncrename -v latitude,lat file.nc
  1. Step 2 cause the file.nc unable to open via Panoply (usually I always re-check via ncdump -h file.nc and also check the visualisation via Panoply)

Screen Shot 2020-11-28 at 3 34 39 PM

  1. I try to rename only the dimension or variable, but the data looks strange when opened using Panoply

Screen Shot 2020-11-28 at 3 30 49 PM

I was able to resolve this issue by fixing the NetCDF using various methods.

I needed to rename the units to "degrees_celsius" and renamed the lat/lon coordinate variables and dimensions. For this I used xarray:

import xarray as xr

ds = xr.open_dataset("/mnt/c/home/data/climate/CHIRTSmax.monthly.nc")
ds["Tmax"].attrs["units"] = "degrees_celsius"
name_dict = {
    "latitude": "lat",
    "longitude": "lon",
}
ds = ds.rename(name_dict=name_dict)
ds.to_netcdf(path="/mnt/c/home/data/climate/james_CHIRTSmax.monthly.nc")

Then I reduced the resolution (to make the amount of data less for the test) using NCO (ncpdq) and finally I ran the PET computation script:

$ ncpdq --dmn lon,0,,4 --dmn lat,0,,4 -a lat,lon,time -O james_CHIRTSmax.monthly.nc CHIRTSmax.monthly.extract_for_climate_indices.nc

$ process_climate_indices --index pet --periodicity monthly --netcdf_temp CHIRTSmax.monthly.extract_for_climate_indices.nc --var_name_temp Tmax --output_file_base rbb_cli_chirts_monthly_pet_1983_2016_d1 --multiprocessing all
2020-11-29  17:25:28 INFO Start time:    2020-11-29 17:25:28.334531
2020-11-29  17:25:28 INFO Computing PET
2020-11-29  17:26:31 INFO End time:      2020-11-29 17:26:31.788486
2020-11-29  17:26:31 INFO Elapsed time:  0:01:03.453955

Using Panoply to visualize the results:
pet_thornthwaite_in_rbb_cli_chirts_monthly_pet_1983_2016_d1_pe

Thank you @monocongo I have completed PET calculation using CHIRTS data, and will use this to calculate SPEI

Screen Shot 2020-11-30 at 8 54 55 AM

As I experience some errors while preparing the data for PET calculation, I would like to write step-by-step guide about this. But I need to clarify some point:

  • Temperature unit.
    CHIRTS write the unit as "degrees Celcius" and probably other data provider will write differently i.e. "degC" or others and I found in old post about input data requirement, but now is removed.
    I assume this is still valid as you suggest in above comment to use "degrees_celsius". I think it's better to include this in the climate-indices documentation again.
Units Data Dimensions
Precipitation ‘millimeters', 'millimeter' or 'mm''inches', 'inch', 'in' Gridded: Latitude, Longitude, and Time OR Time, Latitude, and Longitude“lat”, “lon”, “time” OR “time”, "lat”, “lon”
Temperature ‘degree_celsius', 'degrees_celsius', 'celsius', 'c''degree_fahrenheit', 'degrees_fahrenheit', 'fahrenheit', 'f' Climate Divisions: Divisions and Time“division”, “time”
  • Data Dimension
    CHIRTS used time, latitude, and longitude, according to what is written in the table above. But it no longer works, it required
    ValueError: Invalid dimensions of the temperature variable: ('time', 'latitude', 'longitude') (valid dimension names and order: [[('lat', 'lon', 'time'), ('time', 'lat', 'lon')], [('time', 'division'), ('division', 'time')]] as written in my second comment above.

  • PET script
    In old documentation written as python process_grid.py --index pet --periodicity monthly --netcdf_temp ../example_inputs/nclimgrid_lowres_tavg.nc --var_name_temp tavg --output_file_base ../results/nclimgrid_lowres --calibration_start_year 1951 --calibration_end_year 2010

    and in the current version written as process_climate_indices --index pet --periodicity monthly --netcdf_temp /data/nclimgrid_lowres_tavg.nc --var_name_temp tavg --output_file_base <out_dir>/nclimgrid_lowres --multiprocessing all_but_one

    I am aware we are no longer used process_grid.py but you mentioned The input dataset is monthly data and the calibration period used will be Jan. 1951 through Dec. 2010. in current docs.

    Should we add --calibration_start_year XXXX --calibration_end_year XXXX or not?

Your xarray script above to manipulate variable, dimension and unit in netcdf file really helped me. But I experience an error Try opening your dataset with decode_times=False or installing cftime if it is not installed

Installing cftime and adjust the script based on this answer https://stackoverflow.com/a/55686749 solved my problem and no need to convert time coordinate unit to daily. The result is accepted as PET input.

import pandas as pd
import xarray as xr

ds = xr.open_dataset("/path/input.nc",
        decode_times=False)
units, reference_date = ds.time.attrs['units'].split('since')
ds['time'] = pd.date_range(start=reference_date, periods=ds.sizes['time'], freq='MS')
ds["Tmax"].attrs["units"] = "degrees_celsius"
name_dict = {
    "latitude": "lat",
    "longitude": "lon",
}
ds = ds.rename(name_dict=name_dict)
ds.to_netcdf(path="/path/output.nc")

Yes, there needs to be documentation regarding the various requirements on the input NetCDF files. I'm not sure why it was removed from the documentation in previous edits.

If the calibration years are not specified then the entire period of record is used as the calibration period. The calibration period from 1951-2010 mentioned in the documentation for PET monthly processing is in error (a botched copy/paste from the example above), thanks for finding this issue.