pytroll/satpy

FCI HRFI true_color unavailable even after native resampling if upper_right_corner is used

gerritholl opened this issue ยท 20 comments

Describe the bug

When loading FCI FDHSI and HRFI data and loading the true_color composite, this composite is unavailable even after native resampling.

To Reproduce

import hdf5plugin
from satpy import Scene
from glob import glob
from satpy.utils import debug_on; debug_on()
fci_files = glob(f"/media/nas/x23352/MTG/FCI/2023/12/03/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY--DIS-NC4E_C_EUMT_20231203*_IDPFI_OPE_20231203*_20231203*_N_JLS_C_0070_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["true_color"], upper_right_corner="NE")
ls = sc.resample(resampler="native")
ls.save_datasets()

Expected behavior

I expect to get the true color composite.

Actual results

Output without debug messages:

[INFO: 2023-12-07 17:28:15 : satpy.readers.yaml_reader] Flipping Dataset vis_06 upsidedown.
[INFO: 2023-12-07 17:28:16 : satpy.readers.yaml_reader] Flipping Dataset vis_05 upsidedown.
[INFO: 2023-12-07 17:28:16 : satpy.readers.yaml_reader] Flipping Dataset vis_04 upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.readers.yaml_reader] Flipping Dataset vis_08 upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-12-07 17:28:17 : satpy.modifiers.atmosphere] Removing Rayleigh scattering with atmosphere 'us-standard' and aerosol type 'rayleigh_only' for 'vis_06'
[INFO: 2023-12-07 17:28:17 : pyspectral.rayleigh] Atmosphere chosen: us-standard
[WARNING: 2023-12-07 17:28:17 : satpy.scene] The following datasets were not created and may require resampling to be generated: DataID(name='true_color')
[INFO: 2023-12-07 17:28:17 : satpy.modifiers.atmosphere] Removing Rayleigh scattering with atmosphere 'us-standard' and aerosol type 'rayleigh_only' for 'vis_05'
[INFO: 2023-12-07 17:28:17 : pyspectral.rayleigh] Atmosphere chosen: us-standard
[INFO: 2023-12-07 17:28:17 : satpy.composites.spectral] Applying NDVI-weighted hybrid-green correction with limits [0.15, 0.05] and strength 3.0.
[INFO: 2023-12-07 17:28:17 : satpy.modifiers.atmosphere] Removing Rayleigh scattering with atmosphere 'us-standard' and aerosol type 'rayleigh_only' for 'vis_04'
[INFO: 2023-12-07 17:28:17 : pyspectral.rayleigh] Atmosphere chosen: us-standard
[WARNING: 2023-12-07 17:28:17 : satpy.scene] The following datasets were not created and may require resampling to be generated: DataID(name='true_color')
Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/mwe/fci-true-color-native-resamplying.py", line 9, in <module>
    ls.save_datasets()
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1271, in save_datasets
    raise RuntimeError("None of the requested datasets have been "
RuntimeError: None of the requested datasets have been generated or could not be loaded. Requested composite inputs may need to have matching dimensions (eg. through resampling).

The complete debug output is too long for a GitHub issue.

Environment Info:

  • OS: openSUSE 15.3
  • Satpy Version: main (v0.45.0-36-gfbe017c55)

it seems that the bug starts from this @djhoese commit dc3ee7d

the same happens with the nearest resampler, if that helps

the same happens with the nearest resampler, if that helps

Really? It works for me with the nearest resampler:

import hdf5plugin
from satpy import Scene
from glob import glob
from satpy.utils import debug_on; debug_on()
fci_files = glob(f"/media/nas/x23352/MTG/FCI/2023/12/03/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY--DIS-NC4E_C_EUMT_20231203*_IDPFI_OPE_20231203*_20231203*_N_JLS_C_0070_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["true_color"], upper_right_corner="NE")
ls = sc.resample("eurol", resampler="nearest")
ls.save_datasets()

Does 'native' work if upper_right_corner is not used?

Really? It works for me with the nearest resampler:

hmm, it works if you specify eurol as target area, but not if you leave the default, which is the finest_area ๐Ÿ˜•

Does 'native' work if upper_right_corner is not used?

then it works, indeed. So the question is how could dc3ee7d have broken this ๐Ÿ˜ต

So assuming this failure is consistent, then my best guess is this block of code in the base resampler class in pyresample:

https://github.com/pytroll/pyresample/blob/c39f51efe0d769e4b9b1d8b930c4dad988891757/pyresample/resampler.py#L164-L182

This is used as a shortcut to say "are we resampling to essentially the same area, then just return the result". If the upper_right_corner is considered equivalent or maybe not equivalent in a way that wasn't detected before, then perhaps the match_data_arrays in the Compositor is thinking that the areas are not the same resolution and then failing. The first check would be to see what error the compositor(s) are getting. Is it failing to compare the areas? Or something else?

The issue originates from an IncompatibleAreas that is raised here:

if not all(x.shape == datasets[0].shape for x in datasets[1:]) or \
(optional_datasets and
optional_datasets[0].shape != datasets[0].shape):
raise IncompatibleAreas("RatioSharpening requires datasets of "
"the same size. Must resample first.")

... reason being that datasets[1].shape (which is ndvi_hybrid_green) somehow has shape (0, 22272) instead of the expected (22272, 22272).

The issue is indeed related to the geometries_are_the_same() as you're saying @djhoese , in the sense that if I comment these lines out https://github.com/pytroll/pyresample/blob/e82bf470dae00258231978037455ecb2e133e796/pyresample/resampler.py#L126-L127
hence basically mimicking the old BaseResampler (as this is the biggest difference), the issue is gone. This explains how dc3ee7d broke the code.

However, it looks like geometries_are_the_same() per-se is fine, it gives logical results (actually the same both with and without upper_right_corner, which makes sense).
--> So looks like the issue lies rather on the fact that if geometries_are_the_same() is True, the dataset is directly returned as-is (which seems to be the problem here), while in the old BaseResampler the datasets were always still going through the resampling precompute and compute, (which were somehow avoiding the bug)...

This is where I'm now, I'll try to look further.

Adding a note here to say that this also happens with Himawari data:

bbox = (-1600000, -2770000, 690000, -1040000)

scn = Scene(ifiles_l15, reader='ahi_hsd')
scn.load(['true_color_reproduction'])
scn2 = scn.crop(xy_bbox = bbox)
scn3 = scn2.resample(scn2.coarsest_area(), resampler='native')
scn3.save_datasets(base_dir=tod)#, enhance=False, dtype=np.float32, fill_value=0)

Gives:

RuntimeError: None of the requested datasets have been generated or could not be loaded. Requested composite inputs may need to have matching dimensions (eg. through resampling).

The native, 'nearest', 'bilinear' and gradient_search resamplers all give the error.

If I switch from true_color_reproduction to true_color I do not have the error, though.

We tracked down the issue to slight differences (10^-10) in coords between channels that are natively in the finest/coarsest_area, and channels that have been resampled to it. When a compositor tries to do arithmetics on these channels (e.g. the FCI true_color that computes the NDVI), the coords don't match so the dimension is killed, which causes IncompatibleAreas and the generation failure.

Interesting, thanks for the update. Happy to help test this, might also fix the weird output images issue I mentioned on slack!

Ok, I think I have an explanation:

In theory, those differences shouldn't exist, since we calculate the coords in

x, y = area.get_proj_vectors()

the twice on the same area the same way (once when the finest/coarsest dataset is originally loaded and the area is created, and once again when we resample another dataset to the same finest/coarsest area). So in principle those should match.

BUT, when we use upper_right_corner='NE' in the FCI loading, the area extents of the finest/coarsest area and the dataset (with coordinates) are flipped (here) after being generated --> now, since the area is not exactly centered/simmetrical around the (0,0) point, the second call to area.get_proj_vectors() during the native resampling is now using different input values from the flipped area, hence results in slightly different recomputed coordinates for the resampled array.

In fact, these are the parameters used inside get_proj_vectors during the first area generation (when the scene is still upside down), and then when the native resampler computes the coordinates for the resampled array:

# first call
self.pixel_size_x, self.pixel_size_y
Out[1]: (499.99999996746476, -499.99999996746476)
self.pixel_upper_left[0], self.pixel_upper_left[1]
Out[2]: (-5567749.999637712, -5567749.999637712)

# second call
self.pixel_size_x, self.pixel_size_y
Out[3]: (499.99999996746476, 499.99999996746476)
self.pixel_upper_left[0], self.pixel_upper_left[1]
Out[4]: (-5567749.999637712, 5567749.999637694)

note the different ending digits --> so, no wonder that the resulting coordinates are not matching anymore, and xarray decides to kill the dimension.

Edit: it's actually not the differing digits, the issue is rather floating point precision arithmetics, see this calculation of the coordinate of the same corner pixel with and without flipping, ending up with a difference in the last digit:

-(-499.99999996746476)*22271 + (-5567749.999637712) = 5567749.999637695
- 499.99999996746476*0 +  5567749.999637694         = 5567749.999637694

I didn't check, but my bet would be that the bbox issue is similar, in that the recomputed coordinates after the cropping are slightly different compared to the original ones computed over the full scene - maybe even because of floating point precision?

That being said, I'm not sure what the best fix is, if we want to keep the nice geometries_are_the_same() shortcut ๐Ÿ˜… a dirty solution for the composites generation could be to swap coordinates inside the CompositeBase after the check_geolocation checks in match_data_arrays are passed, or even swapping coordinates inside the resampled Scene when the resampling is done, so that the arrays could be used together also outside of composites. But maybe there's something nicer...

In fact, the issue exists outside of composites as well, it can be simplified to:

scn = Scene(filenames=filenames, reader='fci_l1c_nc')
scn.load(['vis_06', 'vis_08'], upper_right_corner='NE')
res_scn = scn.resample(resampler='native')
diff = res_scn['vis_06'] - res_scn['vis_08']
diff
Out[1]: 
<xarray.DataArray (y: 0, x: 22272)>
dask.array<sub, shape=(0, 22272), dtype=float64, chunksize=(0, 22272), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) float64 
  * x        (x) float64 -5.568e+06 -5.567e+06 ... 5.567e+06 5.568e+06
    crs      object PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["unknown",E...

(note the 0-size dimension)

I'm not sure this explains the AHI and ABI failures some have reported but I'll try to take a look. Good finds so far. Let me see if I understand the segmented flipping: We have this chunk of code:

if target_northup != current_northup:
dataset, area_extents_to_update = _flip_dataset_data_and_area_extents(dataset, area_extents_to_update,
"upsidedown")
if target_eastright != current_eastright:
dataset, area_extents_to_update = _flip_dataset_data_and_area_extents(dataset, area_extents_to_update,
"leftright")
dataset.attrs["area"] = _get_new_flipped_area_definition(dataset.attrs["area"], area_extents_to_update,
flip_areadef_stacking=target_northup != current_northup)

which says "Ok we need to flip the data in one or more ways, flip the array and the extents as needed". Then it says "Create a new copy of the area with the flipped extents". If my understanding is correct then I suggest _get_new_flipped_area_definition (or at least the call to it) be replaced with _apply_flipped_area_definition_and_coords or just follow _get_new_flipped_area_definition by:

del dataset.coords["x"]
del dataset.coords["y"]
del dataset.coords["crs"]
dataset = add_crs_xy_coords(dataset, dataset.attrs["area"])

With a comment to maybe say "floating point precision may produce different coordinate results depending on which extents are used".

Oh! I figured out the problem with Scene.crop!

So the code:

scn = Scene(...)
scn.load([A, B, C])
cropped_scn = scn.crop(xy_bbox=[1, 2, 3, 4])
resampled_scn = cropped_scn.resample(resampler="native")
resampled_scn["C"].coords["y"].values - resampled_scn["A"].coords["y"].values

will rarely return equal values with the new BaseResampler always skipping src_area == dst_area cases. The reason is the same as for the FCI flipped areas: the extents for the new coordinates in the resampler are being computed from the cropped version of the original area. The coordinates on the finest resolution DataArray(s) are a subset (a slice) of the coordinates originally computed from the whole source area. Here's ABI:

In [1]: from satpy import Scene; from glob import glob
scn = 
In [2]: scn = Scene(reader="abi_l1b", filenames=glob("/data/satellite/abi/20180623/OR_ABI-L1b-RadF-M3C0[1234]*.nc"))

In [3]: scn.load(["C01", "C02", "C03", "C04"])

In [10]: cscn = scn.crop(xy_bbox=[-20000, -20000, 10000, 10000])

# The same resolution bands are still just subsets of the same original band coordinates
In [11]: np.count_nonzero(cscn["C01"].coords["y"].values - cscn["C03"].coords["y"].values)
Out[11]: 0

In [13]: cscn_resamp = cscn.resample(resampler="native")

# Now, looking at C01 and C02 (instead of C03 like before) the coordinates are not the same
In [15]: cscn_resamp["C01"].coords["y"].values - cscn_resamp["C02"].coords["y"].values
Out[15]:
array([ 8.53106030e-10,  6.98491931e-10,  5.42058842e-10,  3.87444743e-10,
        2.31921149e-10,  7.63975549e-11,  8.53106030e-10,  6.97582436e-10,
        5.42058842e-10,  3.87444743e-10,  2.31921149e-10,  7.63975549e-11,
       -7.82165444e-11,  6.97582436e-10,  5.42058842e-10,  3.87444743e-10,
        2.31921149e-10,  7.63975549e-11, -7.82165444e-11,  6.96672942e-10,
        5.42058842e-10,  3.87444743e-10,  2.31011654e-10,  7.63975549e-11,
       -7.82165444e-11,  6.96672942e-10,  5.42058842e-10,  3.87444743e-10,
        2.31011654e-10,  7.63975549e-11, -7.82165444e-11,  6.96672942e-10,
        5.42058842e-10,  3.87444743e-10,  2.31011654e-10,  7.45785655e-11,
       -7.82165444e-11,  6.96672942e-10,  5.40239853e-10,  3.87444743e-10,
        2.31011654e-10,  7.45785655e-11, -7.82165444e-11,  6.96672942e-10,
        5.40239853e-10,  3.87444743e-10,  2.31011654e-10,  7.45785655e-11,
       -7.82165444e-11,  6.96672942e-10,  5.40239853e-10,  3.87444743e-10,
        2.31011654e-10,  7.27595761e-11, -8.00355338e-11,  6.98491931e-10,
        5.38420863e-10,  3.85625754e-10,  2.32830644e-10,  7.27595761e-11])

I'll think about this after my next meeting. We can always work around the new base resampler, but I'm a little annoyed that we have to.

Ok I've been trying to think about this the last couple hours and I think we can re-implement the equivalent behavior of the base resampler (pyresample PR required), but I see this as more of a band-aid than really solving the problem.

At the heart of our issue is something we can't really change or at least not reasonably: xarray is using coordinates to determine how to combine DataArrays (arithmetic, etc). I think the cheating way would be to set "arithmetic_join" to "override" in xarray's set_options.

Next thought is how to look at this problem from Satpy's point of view and what Satpy/we can do to handle things: The subset (a.k.a. slice) of x/y coordinate arrays is not equal to x/y coordinates generated from an area that is a subset/slice of the original area. So some initial options I thought of were that we could force the update of spatial coordinates (x/y) after any operation affecting the .attrs["area"] of a DataArray. For example cropping and resampling. But this assumes that we are doing these operations on every DataArray we'll end up using in the future. If I did a Scene.resample there's no reason I couldn't pass a specific set of product names with datasets=["a", "b", "c"] and then do another resample with a different set of datasets with different resampling arguments (radius of influence, weighting, etc) and then combined all the results together. These operations should require knowledge of every other DataArray you're going to work with in the future to be combine-able.

Other things to consider:

  1. Improve precision of coordinate calculations in pyresample so this doesn't happen. One thing might be ensuring that 64-bit floats are used everywhere possible. Another thing might be somehow (efficiently) round or chop off coordinate calculations that go beyond the ability of the precision we have available.
  2. Drop x/y coordinates from Satpy generated DataArrays. Optionally use an accessor (geoxarray?) to get the x/y coordinates on the fly if needed. This is a huge backwards compatibility issue and a direction I don't want Satpy to go in (it makes it harder to collaborate with external projects).

I also encountered some problems under another topic which may be related to this one.
#2557 (comment)

Thanks @djhoese for your thoughts!

Improve precision of coordinate calculations in pyresample so this doesn't happen. One thing might be ensuring that 64-bit floats are used everywhere possible.

I'm not sure this would fix the issue reliably, as in the case of FCI for example we're already using explicitly float64 in the extents calculation, but we're still ending up with floating point differences (see my edit in the previous comment, by just flipping the area, not even cropping :/

Another thing might be somehow (efficiently) round or chop off coordinate calculations that go beyond the ability of the precision we have available.

Rounding/chopping could maybe be more successful, even though there's still the risk that the floating point difference tips the rounding/chopping one or the other side, likely causing the same failures..