zarr-developers/zarr-python

bug in save for variable length arrays?

JamiePringle opened this issue · 2 comments

This is related to the issue in #691, but with more specifics. Saving a variable length (ragged) array with zarr.save fails, but creating the file without the convenience function works. I am running python 3.9.12 and zarr 2.11.3, and when I try to run

import zarr
import numcodecs
import numpy as np

z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
z[0] = np.array([1, 3, 5])
z[1] = np.array([4])
z[2] = np.array([7, 9, 14])
z[3] = np.array([1,1])

zarr.save('jnk.zarr',z)

It fails with a traceback that culminates with

File ~/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/storage.py:427, in _init_array_metadata(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec, dimension_separator)
    424 if object_codec is None:
    425     if not filters:
    426         # there are no filters so we can be sure there is no object codec
--> 427         raise ValueError('missing object_codec for object array')
    428     else:
    429         # one of the filters may be an object codec, issue a warning rather
    430         # than raise an error to maintain backwards-compatibility
    431         warnings.warn('missing object_codec for object array; this will raise a '
    432                       'ValueError in version 3.0', FutureWarning)

ValueError: missing object_codec for object array

This makes it rather hard to use ragged arrays... Am I doing something dumb? Or is something broken? What I really need to do is write ragged arrays to zarr data stores. When I type z.filters it returns [VLenArray(dtype='<i8')].

However, if I manually create the data store, it works fine -- the following code works:

import zarr
import numcodecs
import numpy as np

store = zarr.DirectoryStore('jnkStore.zarr')
root=zarr.group(store=store)

z = root.empty(shape=(4,),name='z',dtype=object, object_codec=numcodecs.VLenArray(int))
z[0] = np.array([1, 3, 5])
z[1] = np.array([4])
z[2] = np.array([7, 9, 14])
z[3] = np.array([1,1])

Can you please also include conda list or pip list as appropriate?

Attached is the output of conda list. Cheers, Jamie

conda_list.txt
t