
DOC: Example of thresholds functionality?

  • icclim version: 6.4.0
  • Python version: 3.10.8


I'm trying to use the thresholds functionality described here and not having success, so I was wondering if it would be possible to add an example to the documentation?

Minimal reproducible example

In this example (see full notebook), I want to calculate the TX90p index for a future period using the same 90th percentile thresholds that I used for an historical period.

import icclim
import xarray as xr

hist_files = [
ds_hist = xr.open_mfdataset(hist_files)


ds_thresholds = xr.open_dataset('TX90p.nc')
ds_thresholds = ds_thresholds[['tasmax_thresholds']].squeeze()
ds_thresholds = ds_thresholds.rename({'tasmax_thresholds': 'tasmax'})

ssp_files = [
ds_ssp = xr.open_mfdataset(ssp_files)

    in_files={'tasmax': ds_ssp, 'thresholds': ds_thresholds},

Output received

2023-08-25 16:42:09,330 --- icclim 6.4.0
2023-08-25 16:42:09,333 --- BEGIN EXECUTION
2023-08-25 16:42:09,334 Processing: 0%

MergeError: conflicting values for variable 'tasmax' on objects to be combined. You can skip this check by specifying compat='override'.

bzah commented

Hi @DamienIrving and sorry for the late reply.
To me it seems the error is due to the merge of the 3 ssp370 files.
I guess there is an issue with the coordinates or attributes of tasmax variable in one of the file.
Or, icclim uses the wrong merge strategy in which case we need to fix this.

I can only find aggregated data for tasmax_day_ACCESS-ESM1-5_ssp370_r6i1p1f1_gn_20650101-21001231.nc.
If you have a link to the 3 ssp files that you used, that would be helpful to reproduce the issue. Otherwise I will try to do it with another dataset.

bzah commented

After some experimentation, this is indeed not an issue with icclim.
I was able to reproduce the error with the following MRE, using xarray only. The MergeError is raised if the datasets subject to the merge (the 3 ssp370 datasets in your case) have overlapping dimensions and incompatible values for the overlaps.

In the case below, this is cause by the time dimension overlapping for year 2044 in both dataset and by having different values for these overlaps (1 in da1 and 2 in da2).

import numpy as np
import xarray as xr
import pandas as pd

da1 = xr.DataArray(
    data=(np.full(365 * 5, 1).reshape((365 * 2, 1, 1))),
    dims=["time", "lat", "lon"],
        "lat": [42],
        "lon": [42],
        "time": pd.date_range("2043-01-01", periods=365 * 2, freq="D"),
    attrs={"units": "K"},

da2 = xr.DataArray(
    data=(np.full(365 * 5, 2).reshape((365 * 2, 1, 1))),
    dims=["time", "lat", "lon"],
        "lat": [42],
        "lon": [42],
        "time": pd.date_range("2043-01-01", periods=365 * 2, freq="D"),
    attrs={"units": "K"},

xr.merge([da1, da2])
# ^ MergeError


