xcube-dev/xcube

Some unit tests failing with xarray 2024.3.0

Closed this issue · 2 comments

See e.g. https://ci.appveyor.com/project/bcdev/xcube/builds/49524952/job/6jrh4bi9r3o9unpj.

FAILED test/core/store/fs/test_registry.py::NewCubeDataTestMixin::test_open_unpacked - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::FileFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::MemoryFsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/store/fs/test_registry.py::S3FsDataStoresTest::test_mldataset_levels - AssertionError: <class 'numpy.float32'> != dtype('float64')
FAILED test/core/test_timeslice.py::TimeSliceTest::test_append_time_slice - ValueError: Specified zarr chunks encoding['chunks']=(180, 2) for variable named 'lat_bnds' would overlap multiple dask chunks ((90, 90), (2,)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.

The <class 'numpy.float32'> != dtype('float64') failures are due to pydata/xarray#8713 fixing pydata/xarray#2304 in xarray 2024.3.0. In summary:

  • Failures were from tests were using a Zarr with a float encoded as int16.
  • Previously, xarray produced a float32 when decoding this variable from the Zarr.
  • In the issue I linked above, it was noted that the NetCDF standard says: "When packed data is read, it should be unpacked to the type of the scale_factor and add_offset attributes, which must have the same type if both are present." xarray 2024.3.0 now implements this behaviour.
  • In a Zarr, these attributes are stored in a JSON file which represents them as decimal numbers without an associated dtype.
  • When xarray reads the Zarr, the attribute values are read as native Python floats (which are 64-bit), which are then converted to NumPy floats (which are therefore float64).
  • Per the NetCDF standard, this np.float64 type is then used for the actual variable data.

So as far as I can see, a variable in a Zarr with scale_factor and add_offset encoding attributes will from now always be read as a float64.

The append failure is due to pydata/xarray#8459 fixing pydata/xarray#8882. append_time_slice unchunks co-ordinate variables after every append, which breaks the next append (since slice co-ordinates are chunked).