perrygeo/python-rasterstats

2D Rasters

Closed this issue · 2 comments

I often have to apply zonal stats on multiple rasters.
From Julia were you have to program zonal stats by yourself, but it's easy. I was able to get a huge boost in speed up by stacking the rasters.

I would therefore ask to extend to from 2D to 3D rasters.

I'd be careful to analyze where the speed boost is coming from. If your raster bands are in different files, or in basically any format other than a band-interleaved-by-pixel (BIP) format, you are going to need to perform at least n read operations for every n bands. I've done extensive testing and, without introducing any parallel compute constructs, there is no performance advantage to handling multiple bands internally vs handling each band separately. Either way, you're blocking on a series of sequential IO steps.

Your Julia implementation likely gets a speed boost by parallelizing read operations (and we can assume you're operating with a very fast disk in a non-IO-constrained environment). You could do the same in Python by running multiple zonal_stats calls in parallel. Granted Python's options for parallel computation are not great (you'd probably need a threadpool, a processpool, or a distributed engine like Dask) but they exist. Putting the parallelism outside this library gives the user flexibility to tackle the parallel compute question with their own tools according to their own needs. It's by design.

There are some cases though that might benefit from rasterizing the geometry only once. In my experience, rasterization is rarely the bottleneck (typically the bottleneck is disk IO for raster reads) but it's still "wasted" computation. We could consider something like this psuedocode, which effectively means we could take an already-rasterized geometry as the vector input arg.

rg = rasterize_geometry(g)

# You could run multiple zonal_stats calls in parallel if you have a way
# to share the rasterized geom between threads/processes
zonal_stats(raster, rg, band=1)
zonal_states(raster, rg, band=2)