xarray-contrib/datatree

setting node name breaks tree linkage

Closed this issue · 8 comments

# a simple tree
root = DataTree(name='root')
child = DataTree(name='child', parent=root)
grandchild = DataTree(name='grandchild', parent=child)

# changing the name of a child node does not correctly update the dict key in it's parent's children
child.name = 'childish'
print(root)  # this appears to be fine
print(list(root.children))  # however, the keys in root.children have not been updated
print(root['childish'])  # so this fails

Simple fix seems to be wherever the name property is being set it needs to also ensure that the keys in self.parent.children are updated as needed. Not sure if there is anywhere else that is storing these keys that also needs updating.

Thank you for reporting this! The offending setter is here

@name.setter

This should update the key it is stored under in it's parent.

This should be a pretty simple fix if you (or perhaps @etienneschalk ?) are interested in going in? (If not then no worries)

Hello @TomNicholas

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

Edit: the answer is in the README: https://github.com/xarray-contrib/datatree?tab=readme-ov-file#deprecation-notice

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

I think we accept bug fixes here, but not new features. And whilst those bugfixes will be moved to xarray, you won't necessarily get full attribution for them (i.e. I'll probably do it the dumb copy-paste way instead of the git wizardry way).

But we should fix the bug here! Because people will still be using this repository for a while yet (as this is what is uploaded to pypi/conda as xarray-datatree)

I'm happy to tackle the fix, but will be traveling for a conference that runs through most of next week, so probably wouldn't get to it until after that. If someone else wants to fix it before then, by all means ;)

What should be the expected behaviour when renaming a child node to None?

I had a look at how xarray behaves when renaming a DataArray inside of a Dataset. It seems that the renaming is just ignored when trying to change the name property of the DataArray directly:

import xarray as xr

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.name.html

xds = xr.Dataset({"a": xr.DataArray([1])})
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xds["a"].name = "toto"
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xda = xds["a"]
xda.name = "toto"
print(xda)
<xarray.DataArray 'toto' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1

@etienneschalk, I find that to be very counterintuitive behavior. My naive expectation would be that the variable should be renamed as desired and the dataset updated to reflect that, and if there was any issue (like renaming to None or to the name of another variable) an exception would be raised. Of course, this is an xarray issue.

Closing in favour of the discussion upstream in pydata/xarray#9447