xarray-contrib/xwrf

[Bug]: Assignment of destaggered variable changes Dataset coordinate attributes

Closed this issue · 3 comments

lpilz commented

What happened?

I'm still working on updating the tutorial and blogpost and stumbled upon another issue which I can't quite explain. Somehow, destaggering a data_var individually and assigning it back changes the attribute of the Dataset coordinates.

Minimal Complete Verifiable Example

>>> ssp2_ds = cat["xwrf-sample-ssp245"].to_dask().xwrf.postprocess()
>>> ssp2_ds.x.attrs
{'units': 'm', 'standard_name': 'projection_x_coordinate', 'axis': 'X'}

>>> ssp2_ds['U'] = ssp2_ds['U'].xwrf.destagger()
>>> ssp2_ds.x.attrs
{'units': 'm',
 'standard_name': 'projection_x_coordinate',
 'axis': 'X',
 'c_grid_axis_shift': 0.5}

Relevant log output

No response

Environment

on xwrf/main

System Information
------------------
xWRF commit : None
python      : 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) 
[GCC 10.3.0]
python-bits : 64
OS          : Linux
OS-release  : 4.18.0-305.25.1.el8_4.x86_64
machine     : x86_64
processor   : x86_64
byteorder   : little
LC_ALL      : en_US.UTF-8
LANG        : en_US.UTF-8
LOCALE      : ('en_US', 'UTF-8')

Installed Python Packages
-------------------------
cf_xarray   : None
dask        : 2022.9.0
donfig      : 0.7.0
matplotlib  : 3.5.3
metpy       : 1.3.1
netCDF4     : 1.6.1
numpy       : 1.22.4
pandas      : 1.4.4
pint        : 0.19.2
pooch       : v1.6.0
pyproj      : 3.4.0
xarray      : 2022.6.0
xgcm        : 0.8.0
xwrf        : 0.0.1.post1

Anything else we need to know?

No response

There are a couple things going on:

  1. c_grid_axis_shift should have been dropped along with stagger...that one's my bad on forgetting! I can get a PR in for that shortly.

  2. When I made #93 to adapt #37, I envisioned the Dataset method to be primary and the DataArray secondary. The number of DataArray destaggering related issues that have come up since merging #93 are definitely symptomatic of the DataArray method being an afterthought. That makes me wonder if

    • we should hold back on releasing the DataArray method until more careful thought and testing can go into it?
    • some extra documentation should be added clarifying that using the Dataset method is preferred and that the DataArray method is less robust?
  3. The dataset[key] = value_as_dataarray operation can be deceptive based on how you conceptualize the data model. This operation is ends up working like dataset = dataset.merge(value_as_dataarray.to_dataset(), combine_attrs="override"), meaning that any coordinates (and their attrs) in the datarray end up overriding what already existed in the dataset. Safer options are:

    • dataset[key] = value_as_dataarray.variable
    • dataset = dataset.merge(value_as_dataarray, combine_attrs="drop_conflicts")

@lpilz Do you think with #106 and #107 this is sufficiently resolved for the purposes of the v0.0.2 release? Or should we still consider some documentation changes around the DataArray destaggering lacking robustness?

lpilz commented

Yes, I would say it's ready for v0.0.2. Thanks for the detailed reply, I never realized how finicky using DataArrays and Datasets together is, but this is probably due to my xarray-ignorance...

Closing this for now, I guess. If somebody disagrees strongly we can always reopen and discuss this further.