Kitware/pan3d

Design Xarray-VTK approach

Closed this issue · 8 comments

Produce a design document for how to implement the interface

Copy to a file and back to memory design

Without the development in the next post, we can get a vtkDataSet from an xarray by using to_netcdf() to save the xarray to an nc file and then use vtkNetCDFCFReader to read the data (for NetCDF CF Convention data). There are several other NetCDF readers that can be used for other types of data. The drawback here is that it is inefficient as we save to a file and then read from the file, but it should be fine for small datasets.

Shallow copy design

Use/modify vtkNetCDFCFReader to parse the xarray structure and refer and use the xarray data. This will allow us to generate a VTK dataset for NetCDF CF data. For other types of data other NetCDF Readers will have to be modified in a similar manner.

In C++ define vtkNetCDFData class that contains variables, dimensions and coordinate names as well as pointers to the data arrays. This class will be Python wrapped, so It can be filled in Python from an xarray python object.

In the vtkNetCDFCFReader implementation, provide two versions of all NetCDF (all calls with an nc_ prefix) calls. The first version are regular NetCDF calls which are used when the reader reads from a NetCDF file. The second version of the nc_ calls read from an vtkNetCDFData object and are activated when we want our reader to use an xarray object for generating the dataset.

So the vtkNetCDFCFReader functions as a reader when it reads from an nc file or as a filter that converts an xarray to a VTK dataset using shallow copy for data arrays.

@kmorel Please review. The goal in this design is to convert xarray to a vtkDataset. As you wrote vtkNetCDFCFReader I thought you might have some good feedback here. Thanks!

Forgive me if my feedback misses the mark. It's been over a decade since I wrote vtkNetCDFCFReader, and this is the first I have looked at xarray.

I get using vtkNetCDFCFReader as a quick-and-dirty way to make the interface work through files. That makes total sense. What I don't understand is using netCDF/CF encoding as an in between format for in-memory transfer.

From what I see in a quick browse of https://docs.xarray.dev/en/stable/user-guide/data-structures.html, the xarray API does not directly use netCDF/CF encoding. Although netCDF has no direct representation of coordinates and time, xarray does. It seems weird to encode xarray's metadata information in more general array attributes only to unencode the same information for a VTK data structure.

Rather than create a vtkNetCDFData object to mimic how netCDF represents arrays and then duplicate the code in xarray and vtkNetCDFCFReader to use this object instead of the netCDF API, why not just make an object that takes an xarray data and returns a VTK data? That sounds like less work and a much cleaner implementation to support.

Forgive me if my feedback misses the mark. It's been over a decade since I wrote vtkNetCDFCFReader, and this is the first I have looked at xarray.

No apologies needed. I am in the same boat for xarray and I had never had the deep dive into CF conventions as required for writing the reader. Thank you for taking the time to give me feedback on this!

I get using vtkNetCDFCFReader as a quick-and-dirty way to make the interface work through files. That makes total sense. What I don't understand is using netCDF/CF encoding as an in between format for in-memory transfer.

From what I see in a quick browse of https://docs.xarray.dev/en/stable/user-guide/data-structures.html, the xarray API does not directly use netCDF/CF encoding.
Although netCDF has no direct representation of coordinates and time, xarray does.

My thoughts here are that xarray is simply a netcdf dataset in memory. See the first paragraph from:
https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html
Based on this, if we have several readers that interpret netcdf arrays (as we do in IO/NetCDF) in different ways, we can use the same readers to interpret xarray arrays.

You are right that xarray has coordinates and time, these seem to be from NetCDF CF. See the following link:
https://docs.xarray.dev/en/stable/user-guide/weather-climate.html

@johnkit Do you guys have seen other types of datasets in xarray besides weather climate data?

It seems weird to encode xarray's metadata information in more general array attributes only to unencode the same information for a VTK data structure.
That is a good point. However I think xarray data is already stored in a netcdf like fashion, the the VTK Reader is already written.

Rather than create a vtkNetCDFData object to mimic how netCDF represents arrays and then duplicate the code in xarray and vtkNetCDFCFReader to use this object instead of the netCDF API,

From xarray will write variable names and array pointers and we'll read the same from vtkNetCDFCFReader.

why not just make an object that takes an xarray data and returns a VTK data?

Because I'll duplicate the NetCDFCFReader and maybe other netcdf readers if we find additional data represented in xarray.

That sounds like less work and a much cleaner implementation to support.

This was @berkgeveci feeling as well. Maybe I am overestimating how much code there is in vtkNetCDFCFReader.

xarray can read gdal raster files, which gives me the final push to directly link xarray to vtkDataSet.
https://docs.xarray.dev/en/stable/user-guide/io.html#rasterio

Dan, I posted a small zarr example (example.zarr.tgz) at https://data.kitware.com/#item/66d225af9eee4150438e24ae. (Github wouldn't let me upload it here for some reason.) Note that you have to untar before trying to load :)