/ldcpy

Statistical and visual tools for gathering metrics and comparing Earth System Model data files. A common use case is comparing data that has been lossily compressed with the original data.

Primary LanguagePythonApache License 2.0Apache-2.0

GitHub Workflow CI Status GitHub Workflow Code Style Status https://img.shields.io/codecov/c/github/NCAR/ldcpy.svg?style=for-the-badge Documentation Status Python Package Index Conda Version https://img.shields.io/badge/DOI-10.5281%20%2F%20zenodo.215409079-blue.svg?style=for-the-badge

Large Data Comparison for Python

ldcpy is a utility for gathering and plotting metrics from NetCDF or Zarr files using the Pangeo stack. It also contains a number of statistical and visual tools for gathering metrics and comparing Earth System Model data files.

AUTHORS:Alex Pinard, Allison Baker, Anderson Banihirwe, Dorit Hammerling
COPYRIGHT:2024 University Corporation for Atmospheric Research
LICENSE:Apache 2.0

Documentation and usage examples are available here.

Reference to ldcpy paper

  1. Pinard, D. M. Hammerling, and A. H. Baker. Assessing differences in large spatio­temporal climate datasets with a new Python package. In The 2020 IEEE International Workshop on Big Data Reduction, 2020. doi: 10.1109/BigData50022.2020.9378100.

Link to paper: https://doi.org/10.1109/BigData50022.2020.9378100

Installation using Conda (recommended)

Ensure conda is up to date and create a clean Python (3.6+) environment:

conda update conda
conda create --name ldcpy python=3.8
conda activate ldcpy

Now install ldcpy:

conda install -c conda-forge ldcpy

Alternative Installation

Ensure pip is up to date, and your version of python is at least 3.6:

pip install --upgrade pip
python --version

Install cartopy using the instructions provided at https://scitools.org.uk/cartopy/docs/latest/installing.html.

Then install ldcpy:

pip install ldcpy

Accessing the tutorial

If you want access to the tutorial notebook, clone the repository (this will create a local repository in the current directory):

git clone https://github.com/NCAR/ldcpy.git

Start by enabling Hinterland for code completion and code hinting in Jupyter Notebook and then opening the tutorial notebook:

jupyter nbextension enable hinterland/hinterland
jupyter notebook

The tutorial notebook can be found in docs/source/notebooks/TutorialNotebook.ipynb, feel free to gather your own metrics or create your own plots in this notebook!

Other example notebooks that use the sample data in this repository include PopData.ipynb and MetricsNotebook.ipynb.

The AWSDataNotebook grabs data from AWS, so can be run on a laptop with the caveat that the files are large.

The following notebooks asume that you are using NCAR's JupyterHub (https://jupyterhub.hpc.ucar.edu): LargeDataGladenotebook.ipynb, CAMNotebook,ipynb, and error_bias.ipynb

Re-create notebooks with Pangeo Binder

Try the notebooks hosted in this repo on Pangeo Binder. Note that the session is ephemeral. Your home directory will not persist, so remember to download your notebooks if you make changes that you need to use at a later time!

Note: All example notebooks are in docs/source/notebooks (the easiest ones to use in binder first are TutorialNotebook.ipynb and PopData.ipynb)

Binder