cerfacs-globc/icclim

ENH: Add option to resample between two dates using `slice_mode`

bzah opened this issue · 6 comments

bzah commented
  • icclim version: 5.2.0
  • Python version: n/a

Description

The upcoming indices that will be implemented for meteo france will need to be computed between two exact dates.
For example:

bilan hydrique estival, du 15 juin au 25 aout.

Pandas/Xarray resampling feature does not natively accepts two dates as a valid input.
However, we can use something like da.resample(time="60D") to resample blocks of 60 days.
We can also use xclim indexer with date_bounds=("06-15", "08-25"). However, this will need some specific development in xclim to enable the feature.

So to enable this feature we must:

  • Get dates doy values
  • Filter the dataset to have on each year only the data between these two doy
  • Resample using "AS-{start_month}"

Remarks:

  • The 29th Feb will shift the index end date by one if the distance is not corrected specifically for leap years.
    This will probably be a hassle to handle, so we should just raise a warning when the 29th is in between the two dates.
  • For now,slice_mode custom resampler algo is sub-optimal because for example on custom season we resample to month, we compute the index, then we filter out the unnecessary months. It would be great to have a better implementation when working on this new feature. done in #168
bzah commented

It seems this feature is already available on xclim (see Ouranosinc/xclim#1069) but I couldn't find the documentation for that. I think it is quite new and that might be why the doc is not yet there. It was introduced by the indexer handling.

xclim handles all the cases of slice_mode:

  • "month" filtering to specific months and "MS" resampling
  • "season" with any of the string keywords but AMJJAS and ONDJFM
  • "season" with specific months. For this, the user must combine two xclim's parameters: month=[2,3] and freq="AS-FEB"
  • "season" with 2 specific dates (not yet on icclim)

The implementation for "between 2 dates" I did in /enh/in_between_years_season branch (see 1ab7c6a) is not finished but I was going in somewhat the same path as xclim's one by converting dates to their day_of_year and filtering the data between those two d.o.y values. I believe xclim devs did a better job than I by handling different calendars.
Our implementation main advantage is the creation of a time_bounds DataArray but, it could be integrated in a post processing to xclim indicator call. Another minor plus, is the addition of dateparser library allowing to use dates such as "14 august" and handling i18n dates such as "3 juillet".
But a big limit of our implementation, as stated in #168, is that we can't utilize xclim missing values checking mechanism.


The best approach now is probably to leave only time_bounds creation in icclim (maybe even try to back-port it to xclim) and parse slice_mode to call xclim with the proper arguments to create seasons (between dates, month specific, string seasons, etc).

Besides, I think it's better to hide both freq and indexer under a single parameter as icclim does with slice_mode. One thing which would definitely be worth adding to slice_mode is the possibility to pass pandas frequency ("M", "AS-FEB", "WS", ...) to it directly.

For info there is already this generic function as well. https://xclim.readthedocs.io/en/stable/indices.html?highlight=aggregate_between#xclim.indices.generic.aggregate_between_dates
It could possibly be interesting to see if it could wrapped into the indicator class

@bzah I thought I would just add that maybe the biggest difference/benefit of the generic aggregate_between_dates and the recent indexer additions in the xclim indicator class is that aggregate_between_dates allows for an array of start, end dates (i.e. that vary spatially over a large territory) to be used as inputs. A specific example would be something like calculating accumulated precipitation within a growing season.

bzah commented

Oh! I didn't catch that, thank you Travis for clarifying it, that's very interesting indeed. I think it would make sense to integrate it in icclim, as a user_index operators. I don't think we should wrap it in icclim ECA&D indices until it is available in xclim Indicator.
What do you think @pagecp ?

Yes I think it is a very interesting concept (and very useful) to have a different start and end date depending on the gridpoint. I agree that already including that in a user_index operator is very appealing, in all cases.

Regarding ECA&D indices, by default I think they assume the same start and end date for all gridpoints (and the same could be said for thresholds like in SU), even in their strict definitions it could be possible to have this spatial variations (they only consider time series in their definitions). This could wait for xclim indicator implementation as there is no urgency to have this spatial variation for the current work.

Scientifically it would be quite useful to have this feature of having different start and end date possible depending on the gridpoint, and also the same for thresholds on field values (not percentiles). "Just" need to have proper metadata ........

bzah commented

I'm migrating these discussions about aggregate between dates to a dedicated issue: #170