Credentials error raised when trying to read data
Closed this issue · 3 comments
A credentials error is raised when trying to access a dataset via the to_dask
method.
import appdirs
import intake
catalog = intake.open_catalog("https://mastapp.site/intake/catalog.yml")
url = "s3://mast/level1/shots/30467.zarr/amc"
dataset = catalog.level1.sources(
url=url, storage_options={"cache_storage": appdirs.user_cache_dir()}
)
dataset.to_dask()
...
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:24 in handler
return await self.sign(operation_name, request)
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\aiobotocore\signers.py:90 in sign
auth.add_auth(request)
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\botocore\auth.py:423 in add_auth
raise NoCredentialsError()
NoCredentialsError: Unable to locate credentials
setting anon=True
seems to fix the NoCredentialsError. I now get a KeyError: '.zmetadata'
dataset = catalog.level1.sources(
url=url, storage_options={"cache_storage": appdirs.user_cache_dir(), "s3": {"anon": True}}
)
dataset.to_dask()
...
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\convenience.py:1360 in open_consolidated
meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:3046 in init
meta = json_loads(self.store[metadata_key])
File ~\AppData\Local\pypoetry\Cache\virtualenvs\nova-stella-ThdiHoO_-py3.12\Lib\site-packages\zarr\storage.py:1448 in getitem
raise KeyError(key) from e
KeyError: '.zmetadata'
NoCredentialsError: Unable to locate credentials
By default s3fs
expects an access key & secret. But we explicitly set anon=True
in the intake catalog. When you're adjusting the temporary directory, you're overwriting the defaults, hence the permissions error.
KeyError: '.zmetadata'
The cause of this is similar to the first issue. We set the endpoint_url
of our storage in the catalog by default. Adding that back into the options you're overriding works for me:
dataset = catalog.level1.sources(
url=url, storage_options={"cache_storage": "/tmp", "s3": {"anon": True, 'endpoint_url': "https://s3.echo.stfc.ac.uk"}}
)
dataset.to_dask()
We'll get the temporary path fixed to something more sensible. This is a nice example of how thin access layers let you work around a bug...
Thanks, this fixes it.