Add option to stream data from cloud instead of downloading locally
mdtanker opened this issue · 1 comments
Currently all of the datasets available withing the fetch
module are downloaded and stored on the users local computer using Pooch
. As some of the these datasets are large, and as polartoolkit
begins to be incorporated into cloud-computing services such as CryoCloud, it would be ideal for users to be able to stream cloud-optimized datasets, instead of having to download the entire datasets.
For now, this is intended just for raster datasets, which are typically supplied as NetCDF (.nc) or GeoTIFF (.tif) files.
It seems that the .zarr
file format may be the best file type to work with cloud storage (https://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud).
It seems like Pangeo-Forge is perfectly set up for this, if I understand it correctly.
I will experiment with creating a Pangeo-Forge recipe for Bedmap2 and report back here with how it went.
Note: This extension seems to allow access to EarthData.
Links:
REMA offers access to their data via AWA:
https://registry.opendata.aws/pgc-rema/
This would be a good dataset to test streaming of gridded data.