mdtanker/polartoolkit

Add option to stream data from cloud instead of downloading locally

mdtanker opened this issue · 1 comments

Currently all of the datasets available withing the fetch module are downloaded and stored on the users local computer using Pooch. As some of the these datasets are large, and as polartoolkit begins to be incorporated into cloud-computing services such as CryoCloud, it would be ideal for users to be able to stream cloud-optimized datasets, instead of having to download the entire datasets.

For now, this is intended just for raster datasets, which are typically supplied as NetCDF (.nc) or GeoTIFF (.tif) files.

It seems that the .zarr file format may be the best file type to work with cloud storage (https://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud).

It seems like Pangeo-Forge is perfectly set up for this, if I understand it correctly.

I will experiment with creating a Pangeo-Forge recipe for Bedmap2 and report back here with how it went.

Note: This extension seems to allow access to EarthData.

Links:

REMA offers access to their data via AWA:
https://registry.opendata.aws/pgc-rema/

This would be a good dataset to test streaming of gridded data.