Credentials error raised when trying to read data

Question

Credentials error raised when trying to read data

Closed this issue 3 months ago · 3 comments

A credentials error is raised when trying to access a dataset via the to_dask method.

import appdirs
import intake

catalog = intake.open_catalog("https://mastapp.site/intake/catalog.yml")
url = "s3://mast/level1/shots/30467.zarr/amc"
dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": appdirs.user_cache_dir()}
)
dataset.to_dask()

...

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:24 in handler
return await self.sign(operation_name, request)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:90 in sign
auth.add_auth(request)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\botocore\auth.py:423 in add_auth
raise NoCredentialsError()

NoCredentialsError: Unable to locate credentials

Answer 1 · 2024-10-02T14:45:50.000Z

setting anon=True seems to fix the NoCredentialsError. I now get a KeyError: '.zmetadata'

dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": appdirs.user_cache_dir(), "s3": {"anon": True}}
)
dataset.to_dask()

...

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\convenience.py:1360 in open_consolidated
meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:3046 in init
meta = json_loads(self.store[metadata_key])

File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:1448 in getitem
raise KeyError(key) from e

KeyError: '.zmetadata'

Answer 2 · 2024-10-02T15:32:06.000Z

Hi @Simon-McIntosh

NoCredentialsError: Unable to locate credentials

By default s3fs expects an access key & secret. But we explicitly set anon=True in the intake catalog. When you're adjusting the temporary directory, you're overwriting the defaults, hence the permissions error.

KeyError: '.zmetadata'

The cause of this is similar to the first issue. We set the endpoint_url of our storage in the catalog by default. Adding that back into the options you're overriding works for me:

dataset = catalog.level1.sources(
    url=url, storage_options={"cache_storage": "/tmp", "s3": {"anon": True, 'endpoint_url': "https://s3.echo.stfc.ac.uk"}}
)
dataset.to_dask()

We'll get the temporary path fixed to something more sensible. This is a nice example of how thin access layers let you work around a bug...

Answer 3 · 2024-10-03T07:40:51.000Z

Thanks, this fixes it.