pangeo-data/rechunker

Rechunker doesn't preserve all attributes

Closed this issue · 2 comments

Hi, this may be a user error, but in some regression tests I found that my output rechunked store does not contain the same attributes as my input store. I could be doing something wrong but I did put together a simple script that shows what I'm seeing.

I'm just creating a simple zarr store, with attributes on each group, after rechunking only the root and end leaf nodes have attributes.

I'm just seeing that the attribute originally on source's root.foo does not appear target's root.foo.

<edited 02/16/2023> : This is without consolidating the metadata on the target store. when the target.zarr's metadata is consolidated after creation you do see the below key in the .zmetadata file, but the source/foo/.zattrs file still does not exist.

        "foo/bar/.zattrs": {
            "description": "foo description"
        }

Let me know if there are other tests I can provide.

Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:27:05)
[Clang 12.0.1 ] on darwin
>>> import zarr
>>> zarr.__version__
'2.13.3'
>>> import rechunker
>>> rechunker.__version__
'0.5.0'

Thanks
Matt

import zarr
from rechunker import rechunk
import shutil


def run_create_input_store():
    shutil.rmtree('testoutput/', ignore_errors=True)
    store = zarr.DirectoryStore('testoutput/source.zarr')
    root = zarr.group(store=store, overwrite=True)
    foo = root.create_group('foo')
    root.attrs['description'] = 'root description'
    foo.attrs['description'] = 'foo description'
    bar = foo.ones('bar', shape=(10, 10))
    bar[5, 5] = 3
    bar.attrs['description'] = 'BAR description'
    zarr.consolidate_metadata(store)


def rechunkit():
    openstore = zarr.open_consolidated('testoutput/source.zarr')
    array_plan = rechunk(openstore, {'foo/bar': (5, 5)},
                         '1MB',
                         'testoutput/target.zarr',
                         temp_store='testoutput/temp.zarr')
    array_plan.execute()


if __name__ == '__main__':
    run_create_input_store()
    rechunkit()
    print('print source\'s foo attributes')
    with open('testoutput/source.zarr/foo/.zattrs') as fp:
        print(fp.read())
    print('print target\'s foo attributes')
    with open('testoutput/target.zarr/foo/.zattrs') as fp:
        print(fp.read())

Thanks for reporting this and sorry I missed your issue. This indeed seems important.

Fixed by #134.