cerfacs-globc/icclim

BUG: Problem with kg m-2 s-1 units for FractionOfTotal indicators

DamienIrving opened this issue · 10 comments

  • icclim version: 6.1.3
  • Python version: 3.10.8

Description

When I try and calculate a FractionOfTotal indicator (e.g. R75pTOT, R95pTOT, R99pTOT) with input units kg m-2 s-1 the output data is all inf and nan. This problem doesn't happen with any other precipitation indices when the input data has units kg m-2 s-1, nor does it happen with these FractionOfTotal indices if the input units are mm d-1.

Minimal reproducible example

Based on Example 2: Index ETR...

files_pr = [
    "pr_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc",
    "pr_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc",

out_f = "R75pTOT_year_CNRM-CM5_historical_r1i1p1_1930-1939.nc"  # OUTPUT FILE: annual values of R75pTOT

icclim.index(
    index_name="R75pTOT",
    in_files=files_pr,
    var_name="pr",
    slice_mode="year",
    out_file=out_f,
)

Output received

import xarray as xr

ds = xr.open_dataset('R75pTOT_year_CNRM-CM5_histrorical_r1i1p1_1930-1939.nc')

print(ds['R75pTOT'].values)
[[[nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
   ...
  [inf inf inf ... inf inf inf]
  [inf inf inf ... inf inf inf]
  [inf inf inf ... inf inf inf]]

...
bzah commented

Hi Damien,

This is a bit nasty:

First, R75pTOT is basically:

r75ptot = sum_pr_above_75p / total_pr

For a division with numpy, nan is outputted if both sum_pr_above_75p and total_pr are equal to 0, see np.asarray(0) / np.asarray(0).
inf (for infinite) is outputted if sum_pr_above_75p is not 0 but total_pr is equal to 0, see np.asarray(0.1) / np.asarray(0).

Then, most precipitation indices takes only into account values above 1 mm/day (equivalent to 1.1574074074074072e-05 kg m-2 s-1 in your case). Every values below this threshold is not counted to compute the percentile itself, nor to sum the total_pr. But they are summed into sum_pr_above_75p.

So the nan case is expected if, for a given location, every precipitation is 0 mm/day.
The inf case is "expected" if, for a given location, the 75th percentile is close to zero, all values are below 1 mm/day but there are values above 75p and below 1 mm/day.

@DamienIrving can you confirm that in your dataset there are locations where precipitation is always inferior to 1 mm/day and locations where it's always 0 ? The best would be to check if they are indeed the locations where you see nans and infs.

@pagecp I think this behavior is confusing and we should output nan in both cases.
To fix this, we can either:

  • Compute total on every values (including those below 1 mm/day).
  • Compute sum_pr_above_75p only on values above 1 mm/day.

edit: This means that these indices, when computed on very dry regions and/or seasons, could output different results.

Hi,
I am mostly following [parts of] of what is going on in this repo to see if/when issues pop up that may have some bearing on index definition/ specification, as well as performance in edge/corner cases (like "extreme" climates and more). So, just two questions of understanding:

@DamienIrving: I was intrigued to see that you find that FractionOfTotal seems to work for one input data unit (mm d-1) but not for another (kg m-2 s-1). I guess that might be because the datasets themselves were different --- or was it actually the same dataset that you just converted units?

@bzah: Do you have any insight whether the seemingly inconsistent inclusion/exclusion of precipitation values below 1 mm d-1 in the calculation be traced back to previous code generations?

Thanks,
Lars

bzah commented

Hi @larsbarring,

This is inconsistency is present since v6.0, but was not in icclim v5.4 because, at the time, we were relying on xclim fraction_over_precip_thresh which sum values above 1 mm/day on both sum_pr_above_75p and total_pr (for xclim 0.37 that we were using at that time).
I didn't dive down to icclim v4, do you need this information for a specific version of icclim ?

Hi Abel,

Thanks for the quick response! No need to dig any further. I was mainly interested to see whether this could be traced back all the way back to the various R (and fortran) packages.
/Lars

bzah commented

As per discussion with @pagecp, we should only consider the days above 1 mm/day in every part of the calculation.

Original message (FR):

A la relire à nouveau, je crois que scientifiquement c’est mieux de ne considérer que les jours avec au moins 1 mm/day, car quand on analyse un percentile, on ne veut que les jours avec précip, du coup il faudrait tout calculer avec les wet days.

I will try to make this fix asap.

bzah commented

This has been fixed in icclim 6.3.0 which is now out.
Feel free to reopen if needed.

I'm still having the same problem as described in my initial post to this thread using icclim version 6.4.0 (see example notebook) so I think this issue needs to be reopened.

Sorry, meant to tag @bzah and @larsbarring

bzah commented

I'm short on time at the moment, but I'm reopening so that we don't forget to fix this.

Hi, sorry for the late we reply, we are just resuming development on icclim.
I was able to reproduce the issue using the same dataset that you used in your notebook (pr_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_18500101-18691231.nc).
So it seems there indeed something that my simple tests were not able to reproduce. I will continue the investigation.