pytroll/pyresample

EWA resampling in 1.27 slows down four times than 1.26.1

yukaribbba opened this issue · 69 comments

Code Sample, a minimal, complete, and verifiable piece of code

Some regular satpy codes here

area_id = "abc"
projection = some proj4 strings
area_def = create_area_def(area_id=area_id, projection=projection, resolution=250)
scn = scn.resample(area_def, resampler="ewa", weight_delta_max=10, weight_distance_max=1)

Problem description

To produce a 250m rayleigh-corrected true color scene from MODIS L1B:
1.26.1: 83.054s (with CPU almost fully loaded)
1.27: 321.692s (CPU usage around 20% all the time)

Versions of Python, package at hand and relevant dependencies

python: 3.11.3
satpy: 0.42.3-dev
dask: 2023.5.0

How did you install pyresample in this case?

What is area_def? Can you print it out? Based on what you've provided I'm guessing it is a DynamicAreaDefinition. Can you test with nearest neighbor resampling between the two versions? My guess is it isn't EWA that got slower, but loading of the lon/lat data from the swath definition.

Additionally, how many dask workers are you using? Are you customizing your dask array chunk size at all or any other dask-related options?

For my own reference (and yours if you're curious), here are the diffs between the two versions and specifically looking at geometry.py (no EWA files were changed in this last release):

v1.26.1...v1.27.0#diff-631d3c6d4c

But the majority of the work between these two releases was:

  1. New documentation
  2. Add better warning messages or remove unnecessary warning messages
  3. Remove deprecated code

The only thing inside pyresample that I can see affecting your EWA usage is that for number 2 above I switched to the newer pyproj Transformer interface instead of using the deprecated/old Proj interface. I'd be very surprised if this broke something though as it is supposed to result in the same low-level code being used.

I'm on my phone now so I'll test Nearest tomorrow. I installed pyresample as a dependency for satpy. Days ago I updated my packages then I found something not right so today I rolled back to 1.26.1. Beside that rollback, I didn't change any other packages or environment variables -- my PYTOLL_CHUNK_SIZE=2200, DASK_NUM_WORKS=8 and OMP_NUM_THREADS=8.
That areadef is a dynamic one yes. I leave the area extents for pyresample to calculate itself.

I tested the same scene under same env.

Nearest:
1.27: 127s
1.26.1: 124s
(CPU fully loaded)

Bilinear:
After consuming almost all my 64GB memory and a lot of time I have to stop it. (Happened in both versions)

Thanks for testing that. I had a busy day and ran out of time to test this myself but it's on my TODO list. If it really is the same environment (same versions of dask and xarray) then it must be either Cython difference when the package was built, conda-forge build change, or a weird side effect of something changed in pyresample. At least these are my guesses right now.

Not directly related, but I also pinned a project to v1.26 due to Cython dependency changes in 1.27 that complicated a docker setup.

@augustinh22 Do you have any extra details on that? What do you mean?

@yukaribbba did you install Satpy/pyresample via pip or with conda?

Conda. I use miniforge.

@yukaribbba What operating system are you on?

For my own future reference, the last build of pyresample on conda-forge for Python 3.11 used numpy 1.23.5 (other python versions use numpy 1.21). Cython version used was 0.29.34 (also from conda-forge). The build for 1.26.1 used Cython 0.29.33 and the exact same build of numpy as the 1.27 release.

Windows 11 22h2

Running on main currently, here's my test script:

import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["PYTROLL_CHUNK_SIZE"] = "6400"

from satpy import Scene
from glob import glob
from pyresample import create_area_def
import dask

dask.config.set(num_workers=4)

scn = Scene(reader="viirs_sdr",
            filenames=(glob("/data/satellite/viirs/conus_day/SVI04*t180[1234]*.h5") +
                       glob("/data/satellite/viirs/conus_day/GITCO*t180[1234]*.h5")))
scn.load(["I04"])
area_def = create_area_def("test", {"proj": "eqc"}, resolution=1000)
new_scn = scn.resample(area_def, resampler="ewa", weight_delta_max=40.0, weight_distance_max=2.0)
_ = new_scn["I04"].compute(

Running on Python 3.10 conda-based environment. Installed pyresample from source with pip install -e . --no-deps then manually did python setup.py build_ext --inplace to make sure the extensions were built with my installed version of Cython. For my first run they were built with cython 0.29.34 (same as the 1.27 conda-forge package). I then ran the above python code with Pycharm Pro's Profiling which took a total of 9.697s on my laptop.

I then moved the generate .so files, the generated cython _ll2cr.c and _fornav.cpp files to a new directory ext_cy02934 to be sure they were gone. With Cython 0.29.33 (same as the 1.26.1 conda-forge build) I get 7.485s for this script. So a 2s difference just from Cython version...not great.

I then upgraded Cython 0.29.35, a version after the 1.27 conda-forge build, and it still takes 9.635s.

This is not going to be fun.

Oops the version I started with was actually Cython 0.29.35, not 0.29.34. 0.29.34 took 7.8s so decently close to the 0.29.33 version. Now to try with 0.26.1 of pyresample.

Pyresample 0.26.1 locally | Cython 0.29.34 | 7.105s

I diffed the two cython generated C files and they are exactly the same. So it suggests that maybe my test is not very representative of the issue. Let's try mixing in some other bands.

Just for the record the time for the 0.26.1 code with 0.29.33 was 7.467s.

I bumped my script up with more granules and 5 I-bands for the VIIRS data I'm reading. I get ~20.5s of execution time for both. I'm not seeing this 4x slowdown, but I'll keep trying things. I'll try installing directly from conda-forge.

Using the conda packages with my updated script I get 27.0s for pyresample 0.26.1 and 24.8s for pyresample 0.27.0 so it is actually faster in the new version. Granted I'm not investigating dask diagnostic plots to see CPU utilization during these executions, but given the performance difference I'm not that concerned.

So our next path of debugging: @yukaribbba you said you're on Windows and Python 3.11. @augustinh22 you mentioned docker images so I'm guessing you're not using Windows as your container's OS?

I'm using dask 2023.5.0 which is what you said you're using. The only differences now are Python version, but I'd be surprised if that made a huge difference in performance inside Cython code. Maybe it is worth a shot but I don't have time for it now. Otherwise, it is a Windows only issue. Or something about your environment? I guess I could also try a larger (higher resolution) target area. 🤷

Sorry to the people who are running into issues, but I need more info to get a reproducible failure.

Edit: Or point out where I'm doing something wrong.

Sorry for all the messages. I increased the size of my area by setting the resolution to 250m. With conda-forge 0.26.1 I get 62s. With 0.27.0 I get 113s. This is still Python 3.10.

Nevermind, I was looking at the wrong time. The 62s was actually supposed to be 113s. So still not able to reproduce this.

Otherwise, it is a Windows only issue.

I'll try on a Ubuntu VM.

The problem remains in my Linux VM. The environment is:

  • 8-core, 32GB RAM virtual machine on VMware Workstation Pro 17.0.2
  • Ubuntu 22.04.2 updated to the lastest version
  • python=3.11.3 on miniforge3
  • satpy from conda-forge by command "conda install satpy" (no need to add "-c conda-forge" in miniforge3)
  • 1.27 is automatic installed with satpy, while later 1.26.1 is through "conda install pyresample=1.26.1"
  • dask=2023.5.1, xarray=2023.5.0
  • during the whole test, the only change is pyresample
  • PYTOLL_CHUNK_SIZE=2200, DASK_NUM_WORKS=8 and OMP_NUM_THREADS=8
  • same MODIS dataset to true-color, rayleigh-corrected image

1.26.1: 105s
1.27: 292s

Maybe you need to try building the env directly from conda just like me, not using cython locally?

Very interesting.

Maybe you need to try building the env directly from conda just like me, not using cython locally?

I tried both. I started with an existing conda-based (conda-forge packages) environment and uninstalled pyresample. I installed it from conda-forge and built it locally with different cythons.

I'll put together a Python 3.11 environment together and try that. If I can't reproduce it then then I'll try MODIS data instead of VIIRS data. If that still doesn't reproduce it then I'll try your exact chunk size and the true_color composite. Your chunk size is not necessarily optimal (I think) for MODIS data, but maybe I'm misremembering something.

Note that you setting OMP_NUM_THREADS=8 is likely not helping you at all and could actually be hurting you depending on what OpenMP libraries end up getting used in your processing.

Just discovered, there is a chance that PyCharm was pulling in my local dev version of pyresample when I ran my timing script even when I wanted it to use the conda package of pyresample. I still would have expected differences with my Cython rebuilds, but yeah, I just wanted to point this out.

I think I reproduced it! Python 3.11 may be the primary issue. Pyresample 1.26.1 took 62.9s, 1.27.0 took 116.5s. Now to see what happens with a local development version.

For my experience it also occurs in VIIRS SDR. I also tried different composites, true-color, overview, true/false sunz-corrected, true/false rayleigh-corrected, none of them survived.

I actually have no idea what's that OMP meaning just because I saw the satpy documents said so.:sweat_smile: Maybe I need to disable that.

Is there a debug switch just like satpy in pyresample or dask?

What do you mean by debug switch? The debug_on? That's actually just a convenience for turning on debug logging. I think if instead of debug_on you could cheat and do:

import logging
logging.basicConfig(level=logging.DEBUG)

That says that all loggers should print out DEBUG level output.

Nevermind...Since you've already reproduced the issue.

main 0.29.33 112s
main 0.29.34 113s
0.26.1 (local) 0.29.34 62.8s

Ok so now to walk through pull requests until I figure out what happened.

It looks like it was this comparison added by @mraspaud here:

https://github.com/pytroll/pyresample/pull/508/files#diff-0275305df517377d471ba5034726f76701f86ee539d49547684040bbf78fd1a6R124-R125

So PR #508. I have to go offline now but given that this is the issue my guess is that it is comparison the lon/lats of all of the swath definitions. It should likely be doing an is identity check for dask arrays. I'm not sure why it isn't.

Oh but also I only saw this on Python 3.11 so it must be some optimization that Python 3.11 is making that is skipping over some of our shortcuts. This could even be a problem with dask in Python 3.11. It should be easy enough to check I just need to have the time. I'll try later today.

Ok I don't know what I was doing with my Python 3.10 environment yesterday, but I'm able to comment out this if statement in Python 3.10 and 3.11 and get the difference in performance (better with the if statement comments out).

@mraspaud we need to think of a better way to do if source_def == target_def. For the current case, the SwathDefinition.__eq__ method is called, it gets the lons/lats from the AreaDefinition which produces numpy arrays. Then it doesn't get caught in any of the shortcuts (is identity checks, .name dask array checks, etc) so it goes all the way to np.allclose.

However, I'm still not sure why @yukaribbba didn't see performance differences in nearest neighbor since I would have expected this to result in the same thing.

Ah the nearest neighbor resampler doesn't use this base class. So my guess is this if statement has slowed every resampler class that is based on this base class.

Edit: Swath-compatible gradient_search is also 50s slow, but it isn't dask optimized so we're talking 4 minutes versus almost 5 minutes of run time.

@yukaribbba an initial fix is implemented in #520. If you're able to test it it would be much appreciated.

Thanks! That was quick. I'm offline now so will test it tomorrow.

83s, back to normal performance! 😆

There's another question here. I remembered in P2G 2.3 the resolution of an output image(EWA resampled) was exactly the same as I set in grids.conf. I'm not sure which version of pyresample it used. But the recent versions(1.26.1, 1.27, this PR) will bring up something like res_x=375.018139808887099 / res_y=-375.094414093355681 when I set the resolution to 375m. This will cause some problems when I do some mosaic operations.

P2G now uses pyresample's create_area_def utility function for all grid/area creation. If you're creating static grids then the resolutions (I think) are manipulated a bit to have whole pixel sizes. You'd have to adjust the number of pixels or extents so that you have nice even numbers of pixels. If this is a dynamic area then there may be other reasons. Could you create an issue on the polar2grid repository and include an example of the grid definition you are using?

It's a dynamic area. I'm using satpy with pyresample under windows now, but also tried P2G 3.0 in a virtual machine in the past. Both of them have this issue so I guess it may not be a P2G issue. For the grid definition, it's just a normal one in grids.conf, with a PROJ4 string and x/y resolutions.

It's not really an "issue" depending on how you define your area. If you had a 2x2 pixel grid that you wanted at 100m resolution but also wanted it to have extents that are 300m apart (instead of 200m) then one of those values has to change. Can you paste your definition here and I can take a quick look? I guess I would have expected resolution to take priority in the dynamic area case.

I just noticed you mention grids.conf. You may benefit from using the grids.yaml format used in P2G 3.0. I'd still like more details on which grids you're talking about though. If the projection is a lon/lat one and the resolution is specified in meters (or vice versa) then the units are converted when the area is created which can definitely add some floating point precision issues and is overall an estimate.

I got a comparison here.

P2G 2.3
command line
polar2grid.sh crefl gtiff --true-color --fornav-D 10 --fornav-d 1 --compress NONE --grid-configs /home/ll/grids.conf -g lcc_cn_250 -f /mnt/hgfs/Downloads/Sat/Polar/terra_2000360_0420/
grids config

# grid_name,            proj4, proj4_str,                                                                                width,  height, pixel_size_x, pixel_size_y,           origin_x,          origin_y
lcc_cn_250,             proj4, +proj=lcc +datum=WGS84 +ellps=WGS84 +lat_1=25 +lat_2=47 +lon_0=105 +units=m +no_defs,      None,    None,          250,         -250,               None,              None

output

Size is 11292, 9806
Pixel Size = (250.000000000000000,-250.000000000000000)
Upper Left  (-1977413.430, 4596402.652) ( 81d56'30.87"E, 37d43'23.64"N)
Lower Left  (-1977413.430, 2144902.652) ( 87d 8'13.43"E, 16d 4'57.43"N)
Upper Right (  845586.570, 4596402.652) (115d 0'53.64"E, 39d29'25.04"N)
Lower Right (  845586.570, 2144902.652) (112d42'36.22"E, 17d22'58.13"N)
Center      ( -565913.430, 3370652.652) ( 99d 9'36.23"E, 28d30'12.57"N)

400% enlarged
image

satpy with pyresample(this PR) (environment mentioned above)
code

# using GenericCompositor just as P2G 2.3
files = find_files_and_readers(base_dir=folder, reader="modis_l1b")
scn = Scene(filenames=files, reader_kwargs={'mask_saturated': False})
scn.load(["true_color"])
area_id = "lcc_cn_250"
projection = "+proj=lcc +lon_0=105 +lat_1=25 +lat_2=47 +datum=WGS84 +ellps=WGS84"
res = 250
area_def = create_area_def(area_id=area_id, projection=projection, resolution=res)
scn = scn.resample(area_def, resampler="ewa", weight_delta_max=10, weight_distance_max=1)

output

Size is 11308, 9811
Pixel Size = (249.995323508576064,-249.988563077171307)
Upper Left  (-1977413.430, 4596403.063) ( 81d56'30.86"E, 37d43'23.65"N)
Lower Left  (-1977413.430, 2143765.270) ( 87d 8'20.18"E, 16d 4'22.48"N)
Upper Right (  849533.688, 4596403.063) (115d 3'40.73"E, 39d29'11.59"N)
Lower Right (  849533.688, 2143765.270) (112d44'42.25"E, 17d22'12.53"N)
Center      ( -563939.871, 3370084.166) ( 99d10'50.64"E, 28d29'57.85"N)

400% enlarged
image

Well you can clearly see that 2.3 has better details and the resolution is just clean.

Oh very interesting. I agree. This is bad. Let me see if I can decipher the code.

My test code (for my own reference):

from pyresample.test.utils import create_test_longitude, create_test_latitude
from pyresample import DynamicAreaDefinition

area_def = DynamicAreaDefinition(projection="+proj=lcc +datum=WGS84 +ellps=WGS84 +lat_1=25 +lat_2=47 +lon_0=105 +units=m +no_defs", resolution=(250.0, 250.0))
frozen_area_def = area_def.freeze(lonslats=(create_test_longitude(125.0, 135.0, (200, 100), twist_factor=0.1), create_test_latitude(25.0, 35.0, (200, 100))))

frozen_area_def.pixel_size_x
# 249.9988581543627

frozen_area_def.pixel_size_y
# 249.9987629440828

Our/my goal is to get this resolution closer to the specified resolution. Granted it is pretty close but lets see if I can make it better. The main difficulty is that in P2G 2.3, the internal "Grid definition" objects used pixel resolution as a primary property while AreaDefinitions use the extents.

Edit: Changed twist_factor to 0.1 which gives a more reasonable result with shape (8061, 9093) and:

In [24]: frozen_area_def.pixel_size_x
Out[24]: 249.9906672368728

In [25]: frozen_area_def.pixel_size_y
Out[25]: 250.00687632884492

@yukaribbba Check out #523. If you notice the 2 line change in geometry.py, you should be able to hack this solution in your own copy of the code and test if it fixes your use case. Let me know how it goes.

Reopening just to keep track of the resolution issue too.

Size is 11308, 9811
Pixel Size = (250.000000000000000,-250.000000000000057)
Upper Left  (-1977413.430, 4596515.270) ( 81d56'29.76"E, 37d43'27.26"N)
Lower Left  (-1977413.430, 2143765.270) ( 87d 8'20.18"E, 16d 4'22.48"N)
Upper Right (  849586.570, 4596515.270) (115d 3'43.46"E, 39d29'15.09"N)
Lower Right (  849586.570, 2143765.270) (112d44'43.98"E, 17d22'12.40"N)
Center      ( -563913.430, 3370140.270) ( 99d10'51.49"E, 28d29'59.74"N)

This is the same scene I mentioned above. It's quite quite close...I guess that ".57" is some kind of float issues but I haven't seen such closs call in P2G 2.3.
I also tested two other adjacent scenes. They're both at 250m. No float issues. So I think it's a good result, maybe?

Yeah, definitely a floating point issue. I'm not sure I can do too much about it, but maybe @mraspaud will have ideas in my PR. At that many decimals out we're beyond 32-bit precision and at the end of 64-bit precision I think.

As you can see in my PR I'm taking the left-most-extent + (width - 1) * x_resolution where width is an integer and x_resolution is the floating point number the user provided. The extent is calculated from the provided lon/lats. I suppose those could be rounded to the nearest meter, but we'd have to be careful not to do that in the degrees case. We could always round the number to the nearest 32-bit float I suppose just to avoid unnecessary precision weirdness but that seems like a hack. Any other ideas?

Ok. For me it's acceptable but I'm still curious why 2.3 doesn't have this issue. Maybe it does some round operations like you just said?

No, it is that P2G 2.3 doesn't use pyresample's AreaDefinitions. The GridDefinition object used in P2G 2.3 holds on to the pixel resolution as one of the primary properties of the grid (projection, resolution, shape, upper-left origin). An AreaDefinition holds on to (projection, extents, shape). So in the area definition case the pixel resolution is calculated and then passed to GDAL/rasterio when it makes the geotiff. In the P2G v2.3 grid definition case it just passed the resolution that was explicitly provided by the user.

I see. I agree that we'd better not do that round for precision reasons. So what about the quality?

What do you mean quality? I think rounding the stuff in the dynamic area definition to a certain number of decimal points is fine. Especially since it will have the explicit purpose of preserving what the user asked for.

Hm even when I do less math to get the extent by doing:

            left_x = corners[0] - x_resolution / 2
            bot_y = corners[1] - y_resolution / 2
            area_extent = (left_x,
                           bot_y,
                           left_x + width * x_resolution,
                           bot_y + height * y_resolution)

inside the DynamicAreaDefinition I still get a little floating point precision error at the end.

Well actually I was talking about the image quality there...

Yeah, sorry, I wasn't confusing rounding and quality, I just wasn't sure what your concern was with quality. I assumed that this new version of the code where both resolutions (x and y) end up really really close to the requested resolution meant the resulting image was equivalent to P2G v2.3.

I assumed that this new version of the code where both resolutions (x and y) end up really really close to the requested resolution meant the resulting image was equivalent to P2G v2.3

Yeah I was also hoping that but....
This code:
image
P2G 2.3:
image

Looks like the resolution and the quality are two things here.

What instrument is this? How did you generate the bad quality one? Same resampling algorithm? What is the geolocation information in the geotiff (resolution, projection, size, etc)?

Oh sorry it's the same modis scene as above. The code/command lines are also same here.

The default true color for MODIS in satpy is not sharpened and does not match what P2G 3.0 does sharpening-wise. Therefore I would expect reduced quality.

Hmmm that's a P2G 2.3 image. Does 2.3 also do the sharpening?

So I got another test on the true color but with RatioSharpenCompositor. The code and scene remain same.

RatioSharpenCompositor
image

GenericCompositor
image

P2G 2.3
image

Whether it's sharpened or not, it looks blurred compared with the 2.3 version.

What does your sharpened YAML definition look like? What files are you providing to Satpy and Polar2Grid?

Whoops....I got wrong YAML file which I tested for the neutral_resolution...Sorry for the mess...
After sharpening it does look like 2.3.

Back to that round topic. Can we expand the area just a little larger instead of rounding it? Like:

# extent calculated by DynamicArea
min_x = -1977413.430
min_y = 2143765.270
max_x = 849533.688
max_y = 4596403.063
res = 250

if (max_x - max_y) % res != 0:
    min_x = round(min_x)
    max_x = round(max_x)
    rem = (max_x - min_x) % res
    add = 2 * res - rem
    if add % 2 == 0:
        min_x = min_x - add / 2
        max_x = max_x + add / 2
    else:
        min_x = min_x - (add + 1) / 2
        max_x = max_x + (add - (add + 1) / 2)
print(min_x, max_x)
-1977565.0 849685.0

This way we can find the extent values that are just evenly divisible by the resolution, without hurting any precisions.

I'm not sure I understand the math here (mainly the last if/else block). I'm also not sure how safe doing round(min_x) is for degrees projections or any other units that might come up (kilometers). What about a "rounding" the extents to the nearest multiple of resolution using floor and ceil:

min_x = -1977413.430
min_y = 2143765.270
max_x = 849533.688
max_y = 4596403.063
res = 250

import math

math.floor(min_x / res) * res
# -1977500

math.ceil(max_x / res) * res
# 849750

Or I guess we could even do half or quarter of res to not do entire pixel shifts of the extent.

Ok I got a little fix here.

# extent calculated by DynamicArea
min_x = -1977413.430
min_y = 2143765.270
max_x = 849533.688
max_y = 4596403.063
res = 250

if (max_x - min_x) % res != 0:
    from decimal import Decimal
    min_x = Decimal(str(min_x))
    max_x = Decimal(str(max_x))
    res = Decimal(str(res))
    # this is the gaps between the ideal extent and the reality
    rem = (max_x - min_x) % res
    # we'll expand the whole scene by 2 pixels in width, so we need to add some values
    add = 2 * res - rem
    # split the add value into 2 parts, one for min_x, another for max_x
    min_x = min_x - add / 2
    max_x = max_x + add / 2
    print(min_x, max_x)

-1977564.871 849685.129

It's also fit for decimal degrees.

min_x = -20.023674
min_y = -4.375472
max_x = 70.547499
max_y = 25.520567
res = 0.00225

-20.0249625 70.5487875

You might have to adjust your test case. Your rem ends up being 0.000. I'm guessing Decimal is needed to try to preserve precision? Your if statement uses x and y which I'm guessing is a typo? I'm also thinking we may not need to do the if statement at all and just always apply this rounding (if it isn't needed the math will work that out). Another thought is, does it have to be 2 pixels in add?

I like the cleanliness of my ceil/floor solution, but yeah it could be surprising to some users (maybe?). Oh actually...I think my solution is the same as yours but instead of splitting add in half it splits it to make the extents "clean" numbers aligned with the resolution. I feel like not having any decimal portion is maybe the best way to "guarantee" no floating point issues.

That's a typo. I just edited it. Decimal' s precission could be very high at least I tried print(Decimal('3.1415926535684573278456894357832784785478478666666666666667')) and got the same. We don't need to have 2 pixels. One could be enough.
But finally I think you're right. It's better to leave decimal degree alone. So your code is better in this case.