pytroll/pyspectral

Bug in rayleigh correction related to Dask version >= 2021.5.1

adybbroe opened this issue · 6 comments

Code Sample, a minimal, complete, and verifiable piece of code

Since Dask version 2021.5.1 the unittests fails. The searchsorted function is not supported by Dask anymore.

# Your code here
import numpy as np
from dask.array import from_array

wvl = 444.01239999999996
wvl_coord = np.array([440., 445.], dtype=np.float32)
dawvl_coord = from_array(wvl_coord)

idx = np.searchsorted(wvl_coord, wvl)
daidx = np.searchsorted(dawvl_coord, wvl)

This simple code fails on the last line with the error message:

--> 773     meta = np.searchsorted(a._meta, v._meta)
    774     out = blockwise(
    775         _searchsorted_block,

AttributeError: 'float' object has no attribute '_meta'

Problem description

Expected Output

The unittests should pass!

Actual Result, Traceback if applicable

Versions of Python, package at hand and relevant dependencies

dask                      2021.12.0          pyhd8ed1ab_0    conda-forge
h5py                      3.6.0           nompi_py39h7e08c79_100    conda-forge
hdf5                      1.12.1          nompi_h7f166f4_103    conda-forge
python                    3.9.7           hf930737_3_cpython    conda-forge
numpy                     1.21.4           py39hdbf815f_0    conda-forge

Thank you for reporting an issue !

In rayleigh.py in the get_reflectance method I believe this workaround solve the issue:

        if hasattr(wvl_coord, 'chunks'):
            idx = np.searchsorted(wvl_coord.compute(), wvl)
        else:
            idx = np.searchsorted(wvl_coord, wvl)

At least this makes the unittests pass.

Do we follow the above, or try to remove some of the Daskifying of the lut?
It is rather complicated the handling of the mutual support of both dask arrays and and pure numpy arrays in get_reflectance...

@adybbroe Remind me, what is the shape of wvl_coord and how is it generated? I think your workaround is a nice quick fix. I think the short term solution (but longer than your workaround) is to remove the dask-ification of the LUT values if possible. That's something I was playing with in #133, but was also more focused on putting everything into a map_blocks call. I think the long term solution may be to find a way to wrap the entire get_reflectance method (or a helper method that get_reflectance uses) in a decorator that we create that magically does the xarray[dask], xarray[numpy], dask, or numpy conversion and wraps things in a map_blocks call when possible. Then the internal code only needs to deal with numpy arrays. This is something Martin has asked for in pyorbital too after some of my optimizations in satpy.

@djhoese Shape is (81,) always atm, it is the wavelengths in nanometers read from the LUT table, from 400 to 800 (covering the visible spectrum) in steps of 5nm

Sounds right, your strategy, though I would not know how to do that on my own. Perhaps I could do the above quick fix now and we could get @pnuu 's PRs merged and make a release, and then try follow your step 1 for a new release, and reserve step 2 for later?

Actually, I realized with the help of @mraspaud that the tests are wrong - the wvl_coord is never in reality a Dask array, it is always a numpy array. So, adjusting the tests to reflect real use cases fixes this.
Then I agree that the rayleigh.py and in particular the get_reflectance method needs refactoring. The method is way to long and complext, and the handling of dask versus non-dask use cases is opaque. Maybe better to treat that in a separate issue and PR