zarr-developers/zarr-python

Creating an `Zarr.group` using `fsspec.FSMap` fails

Swordcat opened this issue · 5 comments

Zarr version

v2.14.1

Numcodecs version

v0.11.0

Python Version

3.11

Operating System

MacOs 13.2.1

Installation

using pip into a virtual environment

Description

creating a group using the Zarr.group convenience function with a fsspec.FSMap store fails as mode='w' is not passed to _normalize_store_arg.

This error stems from a change made in #1304 where a fsspec.FSMap is promoted to a fsspec.FSStore within the zarr.storage._normalize_store_arg_v2 function using the default mode of read only.

A fix for array creation (Zarr.create) due to a similar error was implemented with #1309

I propose introducing a similar change to the Zarr.group function, changing the first line of the function from store = _normalize_store_arg(store, zarr_version=zarr_version) to store = _normalize_store_arg(store, zarr_version=zarr_version, mode='w')

Steps to reproduce

The error can be reproduced with the following code

from fsspec.implementations.memory import MemoryFileSystem

import zarr

mfs = MemoryFileSystem()
fsmap = mfs.get_mapper("memory:/tmp/store")
group = zarr.group(store=fsmap)

Console output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/brandur/Documents/Repositories/zarr-python-dev/zarr/hierarchy.py", line 1355, in group
    init_group(store, overwrite=overwrite, chunk_store=chunk_store,
  File "/Users/brandur/Documents/Repositories/zarr-python-dev/zarr/storage.py", line 648, in init_group
    _init_group_metadata(store=store, overwrite=overwrite, path=path,
  File "/Users/brandur/Documents/Repositories/zarr-python-dev/zarr/storage.py", line 711, in _init_group_metadata
    store[key] = store._metadata_class.encode_group_metadata(meta)  # type: ignore
    ~~~~~^^^^^
  File "/Users/brandur/Documents/Repositories/zarr-python-dev/zarr/storage.py", line 1410, in __setitem__
    raise ReadOnlyError()
zarr.errors.ReadOnlyError: object is read-only

Additional output

No response

Flagging as a duplicate of #1352.

@martindurant - any thoughts here?

BTW, thanks a lot for reporting this @Swordcat! We really appreciate it, and we're sorry for this regression!

I propose introducing a similar change to the Zarr.group function, changing the first line of the function from store = _normalize_store_arg(store, zarr_version=zarr_version) to store = _normalize_store_arg(store, zarr_version=zarr_version, mode='w')

This sounds like a great way forward!

I can reproduce the bug with the following setup:

zarr 2.13.6
python 3.10.9
numcodecs 0.11.0
OS AmazonLinux2
Installation conda

And also using s3fs:

import s3fs
import zarr

bucket_name = "s3_bucket_I_can write to"
s3 = s3fs.S3FileSystem(anon=False)
store = s3fs.S3Map(root=f'{bucket_name}/a.zarr', s3=s3, check=False)
zarr_store = zarr.group(store=store)
zarr_store

output:

ReadOnlyError                             Traceback (most recent call last)
Cell In[53], line 6
      4 s3 = s3fs.S3FileSystem(anon=False)
      5 store = s3fs.S3Map(root='daskzarrstack-daskzarr87beea34-w35ea5m24oq4/a.zarr', s3=s3, check=False)
----> 6 zarr_store = zarr.group(store=store)
      7 zarr_store
      9 # with s3.open('daskzarrstack-daskzarr87beea34-w35ea5m24oq4/new-file', 'wb') as f:
     10 #     f.write(2*2**20 * b'a')

File ~/anaconda3/envs/zarr_py310_nb/lib/python3.10/site-packages/zarr/hierarchy.py:1355, in group(store, overwrite, chunk_store, cache_attrs, synchronizer, path, zarr_version)
   1352     requires_init = overwrite or not contains_group(store, path)
   1354 if requires_init:
-> 1355     init_group(store, overwrite=overwrite, chunk_store=chunk_store,
   1356                path=path)
   1358 return Group(store, read_only=False, chunk_store=chunk_store,
   1359              cache_attrs=cache_attrs, synchronizer=synchronizer, path=path,
   1360              zarr_version=zarr_version)

File ~/anaconda3/envs/zarr_py310_nb/lib/python3.10/site-packages/zarr/storage.py:643, in init_group(store, overwrite, path, chunk_store)
    640     store['zarr.json'] = store._metadata_class.encode_hierarchy_metadata(None)  # type: ignore
    642 # initialise metadata
--> 643 _init_group_metadata(store=store, overwrite=overwrite, path=path,
    644                      chunk_store=chunk_store)
    646 if store_version == 3:
    647     # TODO: Should initializing a v3 group also create a corresponding
    648     #       empty folder under data/root/? I think probably not until there
    649     #       is actual data written there.
    650     pass

File ~/anaconda3/envs/zarr_py310_nb/lib/python3.10/site-packages/zarr/storage.py:706, in _init_group_metadata(store, overwrite, path, chunk_store)
    704 key = _prefix_to_group_key(store, _path_to_prefix(path))
    705 if hasattr(store, '_metadata_class'):
--> 706     store[key] = store._metadata_class.encode_group_metadata(meta)  # type: ignore
    707 else:
    708     store[key] = encode_group_metadata(meta)

File ~/anaconda3/envs/zarr_py310_nb/lib/python3.10/site-packages/zarr/storage.py:1398, in FSStore.__setitem__(self, key, value)
   1396 def __setitem__(self, key, value):
   1397     if self.mode == 'r':
-> 1398         raise ReadOnlyError()
   1399     key = self._normalize_key(key)
   1400     path = self.dir_path(key)

ReadOnlyError: object is read-only

Confirmed that I can write to the bucket using s3fs - this works.

with s3.open(f'{bucket_name}/test', 'wb') as f:
    f.write(b"Test line")

Do you have a timeline for when the fixes will be available in pypi or conda? Much appreciate your work on this.

I see no reason why we shouldn't make a release now to fix this rather severe bug.

Zarr 2.14.2 is now on pypi. Conda forge will take a little longer.

Closed by #1354.