RFE: Object storage support
joshmoore opened this issue · 2 comments
Have you e.g. considered reading/writing from/to S3?
In the resave.py script we are working on for the challenge, options like these:
time ./resave.py \
zarr/v0.4/idr0001A/2551.zarr \
--input-bucket=idr \
--input-endpoint=https://uk1s3.embassy.ebi.ac.uk \
--input-anon \
...
prevent the need to download the data locally.
I'm currently working on using zarrs_reencode
but generating a script:
./resave.py zarr/v0.4/idr0001A/2551.zarr --output-script ...
which produces a script per Zarr array of the form:
zarrs_reencode --chunk-shape 1,1,1040,1376 --shard-shape 2,16,1040,1376 --dimension-names c,z,y,x --validate \
zarr/v0.4/idr0001A/2551.zarr/C/3/0 OUTPUT/C/3/0
but this of course won't work when the source or target are on S3.
I've added read support for HTTP stores in zarrs_tools
version 0.5.5 with #11.
zarrs_reencode https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/C/3/0/0 2551.zarr/C/3/0/0
Writing to remote stores is not supported, and I am not sure it is worth adding support given the complexity of supporting many different services + auth. Eventually zarrs
itself will have a Python wrapper for more flexible usage.
P.S. That location reports missing chunks as permission denied when interpreted as an S3 endpoint. I'm not sure if that is standard. HTTP is fine.
I've added read support for HTTP stores in zarrs_tools version 0.5.5 with #11.
🤯 Amazing. I'll give it a try ASAP.
Writing to remote stores is not supported, and I am not sure it is worth adding support given the complexity of supporting many different services + auth.
Understood. Certainly one tricky aspect of all of this.
That location reports missing chunks as permission denied when interpreted as an S3 endpoint. I'm not sure if that is standard.
Heh. Since there's not really a standard, I agree. :) It's definitely my experience that each provider of "S3" has a slightly different take.
HTTP is fine.
👍