Browseable Online Website: https://pangeo-data.github.io/pangeo-datastore/
This repository is where Pangeo's official cloud data catalog lives. This catalog is an Intake catalog. Most of the data is stored in Zarr format and meant to be opened with Xarray.
The master intake catalog URL is
https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml
Using this catalog requires package versions that are quite recent as of April, 2019.
To open the catalog and load a dataset from python, you can run the following code
import intake
cat_url = 'https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml'
cat = intake.Catalog(cat_url)
ds = cat.atmosphere.gmet_v1.to_dask()
To explore the whole catalog, you can try
cat.walk(depth=5)
Several of the datasets within the cloud data catalog are contained in requester pays storage buckets. This means that a user requesting data must provide their own billing project (created and authenticated through Google Cloud Platform) to be billed for the charges associated with accessing a dataset. To set up an GCP billing project and use it for authentication in applications:
- Create a project on GCP; if this is the first time using GCP, a prompt will appear to choose a Google account to link to all GCP-related activities.
- Create a Cloud Billing account associated with the project and enable billing for the project through this account.
- Using Google Cloud IAM, add the Service Usage Consumer role to your account, which enables it to make billed requests on the behalf of the project.
- Through command line, install the Google Cloud SDK; this can be done using conda:
conda install -c conda-forge google-cloud-sdk
- Initialize the
gcloud
command line interface, logging into the account used to create the aforementioned project and selecting it as the default project; this will allow the project to be used for requester pays access through the command line:
gcloud auth login
gcloud init
- Finally, use
gcloud
to establish application default credentials; this will allow the project to be used for requester pays access through applications:
gcloud auth application-default login
To suggest adding a new dataset, please open an issue.