Lilio for ensemble forecasts
semvijverberg opened this issue · 2 comments
Lilio works very well when setting up calendars for reanalysis data (time, lat, lon). However, there is a large 'post-processing' community that is not accommodated by this setup.
There workflow would for example look like:
import climetlab, lilio
import pandas as pd
ds = climetlab.load_dataset("s2s-ai-challenge-training-input",
date=[20200102],
origin='ecmwf',
parameter='t2m',
format='netcdf')
#%%
da = ds.to_xarray()
# %%
anchor_str = pd.to_datetime(str((da.forecast_time[0]).values)).strftime('%m-%d')
cal = lilio.Calendar(anchor=anchor_str)
cal.add_intervals("target", gap="14d", length="14d")
cal.map_years(2000, 2000)
cal.visualize()
lilio.resample(cal, da)
The format of the dataset looks like:
This simple trick does not cut it, because valid_time is a coordinate (not a dim).
lilio.resample(cal, da.rename({"valid_time": "time"}))
I believe the issue is, is that valid_time is a function of forecast_time and lead-time. For each forecast/lead time pair, there is another set of valid_times. However, valid_times are the most 'intuitive' dates to use when you want to create a lilio calendar.
This is a wacky solution:
da_test = da.isel(forecast_time=0)
da_zz = da_test.rename({"lead_time": "time"})
da_zz["time"] = ("time", da_zz.valid_time.values)
lilio.resample(cal, da_zz)
In a way, we may not have to map the lilio calendar to EC46 format, since how EC46 is already kind of nice.
For example, you can easily select lead_time week 3 mean (with a gap of 14 days) using:
da["lead_time"] = ("lead_time", da["lead_time"].values.astype(int)) # <- make slicing possible
da.sel(lead_time=slice(14,21)).mean(dim="lead_time")
The translation from this analysis to lilio is provided by the existing calendar_shifter.ipynb.
We (Sem & Jannes) will play around with this in the coming 2 months.
I didn't know climetlab
, nice.