pangeo-data/climpred

Using datatrees to represent datasets

abkfenris opened this issue · 3 comments

Is your feature request related to a problem? Please describe.

Not many forecasts are stored as init x lead, so that can add additional dataset wrangling for users or data providers are asked to store an additional copy of data.

Describe the solution you'd like

Datatree is working to create a tree-like data structure for Xarray. Datatrees can correspond to NetCDF groups or other hierarchies of datasets.

One of the ways this datatrees can be used is to collect related but non align-able datasets. I think this property could make datatree useful as datasets can be stored within a tree as they are structured on disk. Then a datatreeaccessor can be used to aggregate and reshape the underlying datasets for access and analysis.

I've started exploring using datatrees for forecasts in xarray_fmrc. I've initially modeled it off of THREDDS forecast model run collections, but I think it could support other forecast presentations like climpred's init x lead dataset structure.

I'm mainly coming about this with my data provider hat on, so input from researchers would be really nice (most of my forecast users are fishermen, sailors, surfers, and other folks on the water and around the waterfront, not scientists). There is a discussion going on the Pangeo Discourse.

Describe alternatives you've considered

Lots of individual datasets in ERDDAP, making users assemble things themselves.

Additional context

Relevant datatree links:

Thanks for your interest in climpred @abkfenris

We created climpred before xdatatree was introduced. It would have saved a lot of time and code for us use xdatatree instead of invention the PredictionEnsemble. As both of the core maintainers are now out of academia, we will not do large refactorings. If you are interested, feel invited to make climpred suited for your needs and we can give some guidance.

Hi @aaronspring , I definitely understand that you're not up to refactoring it if it's not something you're using day to day. I can't really justify taking on that big of a refactoring either, but I can try to make sure it's possible to easily kick out a climpred compatible dataset from xarray_fmrc.

to_climpred() sounds great.

You datasets seem to have dimensions forecast_reference_time and actual valid_time with forecast_period as coordinate. One dimensional dimensions can easily be swapped, see similar https://climpred.readthedocs.io/en/stable/api/climpred.utils.convert_init_lead_to_valid_time_lead.html#climpred.utils.convert_init_lead_to_valid_time_lead

climpred also works with these standard_names in the attributes and renames dimensions automatically. Also lead in units ns is converted by climpred internally.