[Bug]: Assignment of destaggered variable changes Dataset coordinate attributes
Closed this issue · 3 comments
What happened?
I'm still working on updating the tutorial and blogpost and stumbled upon another issue which I can't quite explain. Somehow, destaggering a data_var
individually and assigning it back changes the attribute of the Dataset
coordinates.
Minimal Complete Verifiable Example
>>> ssp2_ds = cat["xwrf-sample-ssp245"].to_dask().xwrf.postprocess()
>>> ssp2_ds.x.attrs
{'units': 'm', 'standard_name': 'projection_x_coordinate', 'axis': 'X'}
>>> ssp2_ds['U'] = ssp2_ds['U'].xwrf.destagger()
>>> ssp2_ds.x.attrs
{'units': 'm',
'standard_name': 'projection_x_coordinate',
'axis': 'X',
'c_grid_axis_shift': 0.5}
Relevant log output
No response
Environment
on xwrf/main
System Information
------------------
xWRF commit : None
python : 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)
[GCC 10.3.0]
python-bits : 64
OS : Linux
OS-release : 4.18.0-305.25.1.el8_4.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : ('en_US', 'UTF-8')
Installed Python Packages
-------------------------
cf_xarray : None
dask : 2022.9.0
donfig : 0.7.0
matplotlib : 3.5.3
metpy : 1.3.1
netCDF4 : 1.6.1
numpy : 1.22.4
pandas : 1.4.4
pint : 0.19.2
pooch : v1.6.0
pyproj : 3.4.0
xarray : 2022.6.0
xgcm : 0.8.0
xwrf : 0.0.1.post1
Anything else we need to know?
No response
There are a couple things going on:
-
c_grid_axis_shift
should have been dropped along withstagger
...that one's my bad on forgetting! I can get a PR in for that shortly. -
When I made #93 to adapt #37, I envisioned the Dataset method to be primary and the DataArray secondary. The number of DataArray destaggering related issues that have come up since merging #93 are definitely symptomatic of the DataArray method being an afterthought. That makes me wonder if
- we should hold back on releasing the DataArray method until more careful thought and testing can go into it?
- some extra documentation should be added clarifying that using the Dataset method is preferred and that the DataArray method is less robust?
-
The
dataset[key] = value_as_dataarray
operation can be deceptive based on how you conceptualize the data model. This operation is ends up working likedataset = dataset.merge(value_as_dataarray.to_dataset(), combine_attrs="override")
, meaning that any coordinates (and their attrs) in the datarray end up overriding what already existed in the dataset. Safer options are:dataset[key] = value_as_dataarray.variable
dataset = dataset.merge(value_as_dataarray, combine_attrs="drop_conflicts")
Yes, I would say it's ready for v0.0.2. Thanks for the detailed reply, I never realized how finicky using DataArrays and Datasets together is, but this is probably due to my xarray-ignorance...
Closing this for now, I guess. If somebody disagrees strongly we can always reopen and discuss this further.