/intake-esgf

Programmatic access to the ESGF holdings

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

intake-esgf

Badges

Continuous Integration Documentation Status Code Coverage Status PyPI Conda

Overview

intake-esgf is an intake-esm inspired package under development in ESGF2. The main difference is that in place of querying a static index which is completely loaded at runtime, intake-esgf catalogs initialize empty and are populated by searching, querying ESGF index nodes.

Installation

You may install intake-esgf using pip:

python -m pip install intake-esgf

or conda-forge:

conda install -c conda-forge intake-esgf

Features

For a full listing of features with code examples, please consult the documentation. In brief, intake-esgf aims to hide some of the complexity of obtaining ESGF data and get the user the data as fast as we can.

  • Indices are queried in parallel and report when they fail to return a response. The results are aggregated and presented to the user as a pandas DataFrame.
  • The locations of the data are hidden from the user. Internally we track which locations provide the user the fastest transfers and automatically favor them for you.
  • Files are downloaded in parallel into a local cache which mirrors the remote storage directory structure. They are returned to the user as a dictionary of xarray Datasets. Your search script then becomes the way you download data as well as how you load it into memory for your analysis.
  • Prior to downloading data, we first check that it is not already available locally. This could be because you had previously downloaded it, but also because you are working on a server that has direct access.
  • Cell measure information is harvested from your search results and automatically included in the returned datasets.