xarray-contrib/xarray-schema

parsing method in xarray-schema

giovp opened this issue · 4 comments

giovp commented

hi,

thanks for the really great package. I was wondering if there is any plan to also provide methods for parsing array-like data following the pre-defined schemas. In particular, similar to what xarray-dataclasses provides.

possibly related to #12

thanks!

Hi @giovp! Can you explain a bit more about your use case? I'm not sure exactly what you mean by "parsing array-like data".

Closing this issue since we haven't managed to keep the conversation going. Feel free to reopen if there is more to discuss here.

giovp commented

@jhamman apologies for late reply. What I had in mind is basically an additional parse method that tries to correctly casts the input data in the schema. For example, given a 2D array of an image, and some DataArraySchema, the parse method would return the schema-compliant xarray (otherwise error if cannot be casted correctly).

This is an example implementation that inherits the DataArraySchema

from typing import Any
from xarray_schema.components import (
    ArrayTypeSchema,
    DimsSchema,
)
from dask.array.core import from_array
import numpy as np
from xarray_schema.dataarray import DataArraySchema
from dask.array.core import Array as DaskArray
from xarray import DataArray

class RasterSchema(DataArraySchema):
    
    @classmethod
    def parse(
        cls,
        data,
    ) -> DataArray:
        if ImageModel.array_type.array_type == DaskArray:
            return DataArray(from_array(data), dims=cls.dims.dims)
        return DataArray(data, dims=cls.dims.dims)


class ImageModel(RasterSchema):
    dims = DimsSchema(("y", "x"))
    array_type = ArrayTypeSchema(DaskArray)

    def __init__(self) -> None:
        super().__init__(
            dims=self.dims,
            array_type=self.array_type,
        )

arr = np.random.normal(size=(100,100))
img = ImageModel.parse(arr)
ImageModel().validate(img) # None
img = img.rename({"x":"a"})
ImageModel().validate(img) # Error

I believe some of this tooling is implemented in https://github.com/astropenguin/xarray-dataclasses/ but was wondering if there are ways to converge.

Thanks @giovp! This sounds similar in scope to what I wrote here: #45