Some unit tests failing with xarray 2024.3.0

Question

Some unit tests failing with xarray 2024.3.0

Closed this issue 3 months ago · 2 comments

See e.g. https://ci.appveyor.com/project/bcdev/xcube/builds/49524952/job/6jrh4bi9r3o9unpj.

FAILED test/core/store/fs/test_registry.py::NewCubeDataTestMixin::test_open_unpacked - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::FileFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::MemoryFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::S3FsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/test_timeslice.py::TimeSliceTest::test_append_time_slice - ValueError: Specified zarr chunks encoding['chunks']=(180, 2) for variable named 'lat_bnds' would overlap multiple dask chunks ((90, 90), (2,)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.

Answer 1 · 2024-04-02T16:00:33.000Z

The <class 'numpy.float32'> != dtype('float64') failures are due to pydata/xarray#8713 fixing pydata/xarray#2304 in xarray 2024.3.0. In summary:

Failures were from tests were using a Zarr with a float encoded as int16.
Previously, xarray produced a float32 when decoding this variable from the Zarr.
In the issue I linked above, it was noted that the NetCDF standard says: "When packed data is read, it should be unpacked to the type of the scale_factor and add_offset attributes, which must have the same type if both are present." xarray 2024.3.0 now implements this behaviour.
In a Zarr, these attributes are stored in a JSON file which represents them as decimal numbers without an associated dtype.
When xarray reads the Zarr, the attribute values are read as native Python floats (which are 64-bit), which are then converted to NumPy floats (which are therefore float64).
Per the NetCDF standard, this np.float64 type is then used for the actual variable data.

So as far as I can see, a variable in a Zarr with scale_factor and add_offset encoding attributes will from now always be read as a float64.

Answer 2 · 2024-04-02T16:02:55.000Z

The append failure is due to pydata/xarray#8459 fixing pydata/xarray#8882. append_time_slice unchunks co-ordinate variables after every append, which breaks the next append (since slice co-ordinates are chunked).