Split meteo into daily chunks
Closed this issue · 4 comments
brey commented
Currently we have a way to split the 3 day meteo dataset in daily chunks with
if split_by:
times, datasets = zip(*self.meteo.Dataset.groupby("time.{}".format(split_by)))
mpaths = ["sflux_air_{}.{:04d}.nc".format(m_index, t + 1) for t in np.arange(len(times))]
for das, mpath in list(zip(datasets, mpaths)):
self.to_force(
das, vars=["msl", "u10", "v10"], rpath=path, filename=mpath, date=self.rdate, **kwargs
)
else:
self.to_force(self.meteo.Dataset, vars=["msl", "u10", "v10"], rpath=path, **kwargs)
which outputs 4 daily chunks.
However, there is a problem with the groupby
functionality.
One can see it below:
import pandas as pd
import xarray as xr
tt = pd.date_range('2023-5-30','2023-6-2')
tt.values
array(['2023-05-30T00:00:00.000000000', '2023-05-31T00:00:00.000000000',
'2023-06-01T00:00:00.000000000', '2023-06-02T00:00:00.000000000'],
dtype='datetime64[ns]')
d = xr.Dataset(
{
"time": (("time"), tt.values),
}
)
d.groupby("time.day")
DatasetGroupBy, grouped over 'day'
4 groups with labels 1, 2, 30, 31.
d.groupby("time.dayofyear")
DatasetGroupBy, grouped over 'dayofyear'
4 groups with labels 150, 151, 152, 153.
tt2 = pd.date_range('2023-12-30','2024-1-2')
d2 = xr.Dataset(
{
"time": (("time"), tt2.values),
}
)
d2.groupby("time.dayofyear")
DatasetGroupBy, grouped over 'dayofyear'
4 groups with labels 1, 2, 364, 365.
Using dayofyear
solves it for the monthly occurrence but we get the same for cross years.
pmav99 commented
times = pd.date_range("2023-12-30", "2024-01-02", freq="h")
ds = xr.Dataset(
{
"time": (("time"), times),
}
)
ds.resample(time="1D")
pmav99 commented
By the way, since from our tests we saw that split by day gives some performance boost, should we perhaps enable it by default?
brey commented
I don't think so. If there is a gap in time, resample
will interpolate, thus increasing the wall time. Also, if the file is small (regional window) there might not be a benefit. We should document this well and let the user decide.