/xarray-schema

Schema validation for Xarray objects

Primary LanguagePythonMIT LicenseMIT

xarray-schema

Schema validation for Xarray

CI codecov MIT License

installation

Install xarray-schema from PyPI:

pip install xarray-schema

Conda:

conda install -c conda-forge xarray-schema

Or install it from source:

pip install git+https://github.com/carbonplan/xarray-schema

usage

Xarray-schema's API is modeled after Pandera. The DataArraySchema and DatasetSchema objects both have .validate() methods.

The basic usage is as follows:

import numpy as np
import xarray as xr
from xarray_schema import DataArraySchema, DatasetSchema, CoordsSchema

da = xr.DataArray(np.ones(4, dtype='i4'), dims=['x'], name='foo')

schema = DataArraySchema(dtype=np.integer, name='foo', shape=(4, ), dims=['x'])

schema.validate(da)

You can also use it to validate a Dataset like so:

schema_ds = DatasetSchema({'foo': schema})

schema_ds.validate(da.to_dataset())

Each component of the Xarray data model is implemented as a stand alone class:

from xarray_schema.components import (
    DTypeSchema,
    DimsSchema,
    ShapeSchema,
    NameSchema,
    ChunksSchema,
    ArrayTypeSchema,
    AttrSchema,
    AttrsSchema
)

# example constructions
dtype_schema = DTypeSchema('i4')
dims_schema = DimsSchema(('x', 'y', None))  # None is used as a wildcard
shape_schema = ShapeSchema((5, 10, None))  # None is used as a wildcard
name_schema = NameSchema('foo')
chunk_schema = ChunkSchema({'x': None, 'y': -1})  # None is used as a wildcard, -1 is used as
ArrayTypeSchema = ArrayTypeSchema(np.ndarray)

# Example usage
dtype_schama.validate(da.dtype)

# Each object schema can be exported to JSON format
dtype_json = dtype_schama.to_json()

roadmap

This is a very early prototype of a library. Some key things are missing:

  1. Validation of coords and attrs. These are not implemented yet.
  2. Exceptions: Pandera accumulates schema exceptions and reports them all at once. Currently, we are a eagerly raising SchemaErrors when the are found.
  3. Roundtrip schemas to/from JSON and/or YAML format.

license

All the code in this repository is MIT licensed, but we request that you please provide attribution if reusing any of our digital content (graphics, logo, articles, etc.).

about us

CarbonPlan is a non-profit organization that uses data and science for climate action. We aim to improve the transparency and scientific integrity of climate solutions through open data and tools. Find out more at carbonplan.org or get in touch by opening an issue or sending us an email.