asdf-format/asdf

`asdf.treeutil.walk_and_modify` does not perform a deep copy

braingram opened this issue · 1 comments

Description of the problem

asdf.treeutil.walk_and_modify documentation states:

Modify a tree by walking it with a callback function. It also has the effect of doing a deep copy.

However the function does not perform a deep copy. For the below example the callback will be called 3 times:

  • with the arr, the same instance that is in the tree
  • with a dictionary {'arr': arr} a different instance than the dictionary tree['a']
  • with a dictionary {'a': {'arr': arr}} a different instance than the dictionary tree

To summarize, the leaf object arr is not copied but all container objects (dicts in this case) are "copied".

Example of the problem

import copy
import asdf
import numpy as np

arr = np.arange(42)

tree = {'a': {'arr': arr}}

copied = copy.deepcopy(tree)
assert copied['a'] is not tree['a']
assert copied['a']['arr'] is not arr

def callback(obj):
    return obj

modified = asdf.treeutil.walk_and_modify(tree, callback)
assert modified['a'] is not tree['a']
# note the missing "not" below, arr was not copied
assert modified['a']['arr'] is arr  # this shouldn't pass

System information

asdf version: main branch
python version: 3.10
operating system: mac osx

Related to this, a callback that modifies a non-dict/non-list node modifies the input.

import asdf

class Foo:
    pass

tree = {'f': Foo()}
assert not hasattr(tree['f'], 'bar')

def callback(obj):
    if isinstance(obj, Foo):
        obj.bar = 1
    return obj

asdf.treeutil.walk_and_modify(tree, callback)
assert hasattr(tree['f'], 'bar')
assert tree['f'].bar == 1