hdmf-dev/hdmf

[Bug]: Append does not work for Zarr Arrays (and DatasetofReferences)

Closed this issue · 11 comments

What happened?

NeurodataWithoutBorders/pynwb#1905

Currently, there isn't a append conditional in append_data for Zarr Arrays. this also led to a question of do we support appending a "DatasetofReferences".

To reproduce, use the code here and not the one in pynwb.

Steps to Reproduce

# create mock NWB file
from pynwb.testing.mock.file import mock_NWBFile
from pynwb.file import Device
from pynwb.ecephys import ElectrodeGroup
from hdmf_zarr.nwb import NWBZarrIO

nwbfile = mock_NWBFile()

# add device
device = Device(name="my_device", description="my device")
_ = nwbfile.add_device(device)
# add electrode group
eg = ElectrodeGroup(name="tetrode", description="my_tetrode", location="unknown", device=device)
_ = nwbfile.add_electrode_group(eg)

nwbfile.add_electrode_column(name="source", description="1st or 2nd round")

# add a few electrodes
rel_xs = [0., 10., 20., 30.]
rel_ys = [0., 0., 0., 0.]

for x, y in zip(rel_xs, rel_ys):
    for x, y in zip(rel_xs, rel_ys):
        nwbfile.add_electrode(
            rel_x=x,
            rel_y=y,
            enforce_unique_id=True,
            source="first",
            group=eg,
            location="unknown"
        )


# save to Zarr (same error in "a" mode)
with NWBZarrIO("test.nwb", mode="w") as io:
    io.write(nwbfile)

# now reload and try to add some more electrodes
io = NWBZarrIO("test.nwb", mode="r+d")
nwbfile_read = io.read()

rel_xs = [50., 60., 70., 80.]
rel_ys = [0., 0., 0., 0.]

for x, y in zip(rel_xs, rel_ys):
    nwbfile_read.add_electrode(
        rel_x=x,
        rel_y=y,
        enforce_unique_id=True,
        source="second",
        group=eg,
        location="unknown"
    )

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.12

Package Versions

No response

@mavaylon1 can I ping you on this one? This is a blocking one currently for us! Thanks

@mavaylon1 can I ping you on this one? This is a blocking one currently for us! Thanks

@alejoe91 today we merged the fix for hdmf-zarr. It's trickier for hdf5 so that's in development. Do you need a release or do you clone the repo? The PR is merged but not released.

That's ok! I can install hdmf-zarr from source. Can you point me to the merged PR?

@mavaylon1 still getting an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 46
     43 rel_ys = [0., 0., 0., 0.]
     45 for x, y in zip(rel_xs, rel_ys):
---> 46     nwbfile_read.add_electrode(
     47         rel_x=x,
     48         rel_y=y,
     49         enforce_unique_id=True,
     50         source="second",
     51         group=eg,
     52         location="unknown"
     53     )

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/pynwb/file.py:728, in NWBFile.add_electrode(self, **kwargs)
    725     else:
    726         d.pop(col_name)  # remove args from d if not set
--> 728 self.electrodes.add_row(**d)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/common/table.py:715, in DynamicTable.add_row(self, **kwargs)
    713     c.add_vector(data[colname])
    714 else:
--> 715     c.add_row(data[colname])
    716     if check_ragged and is_ragged(c.data):
    717         warn(("Data has elements with different lengths and therefore cannot be coerced into an "
    718               "N-dimensional array. Use the 'index' argument when creating a column to add rows "
    719               "with different lengths."),
    720              stacklevel=2)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/common/table.py:52, in VectorData.add_row(self, **kwargs)
     50 """Append a data value to this VectorData column"""
     51 val = getargs('val', kwargs)
---> 52 self.append(val)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/container.py:952, in Data.append(self, arg)
    950 def append(self, arg):
    951     self._validate_new_data_element(arg)
--> 952     self.__data = append_data(self.__data, arg)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/data_utils.py:43, in append_data(data, arg)
     41 else:
     42     msg = "Data cannot append to object of type '%s'" % type(data)
---> 43     raise ValueError(msg)

ValueError: Data cannot append to object of type '<class 'hdmf_zarr.zarr_utils.ContainerZarrReferenceDataset'>'

This is running the same code to reproduce the issue and using the dev branch of hdmf-zarr

@alejoe91 You are correct. Forgot to say there is a hdmf side ticket in the works. Since you are needing it right away, I can break the PR into two parts. Part 1 is just to support hdmf-zarr and Part 2 will be to support hdmf append references functionality. I will aim to get Part 1 merged today.

@alejoe91 Waiting on the PR to be reviewed. Should be today. I ran the script above with the minor update and it works. I will let you know when it is merged. Let me know if you are running into issues.

Thanks!!

@mavaylon1 should this be fixed when installing hdmf from main?

@alejoe91 This was included in the last release. Let me know if you are running into issues.