zonal_stats does not include expected raster cells
Tinkaa opened this issue · 4 comments
I created a toy xr.DataArray
and gpd.GeoDataFrame
. When applying zonal_stats
to this, the results are not as expected. It identifies 2 cells within one polygon and 1 within the other. I would expect the result to have 4 cells in the one polygon, and 0 in the other. I have tried to dig into the code what causes this but haven't been able to understand it. I am guessing it has something to do with the affine or how the polygons are defined.
When I apply rio.clip
, it does work as expected.
Do you know what causes this unexpected behaviour of zonal_stats
?
All code is shown below
import xarray as xr
from shapely.geometry import Polygon
import geopandas as gpd
import matplotlib.pyplot as plt
from rasterstats import zonal_stats
import rioxarray
da = xr.DataArray(
[[1,2,3],
[4,5,6]],
dims=("x", "y"),
coords={"x": [0.5,1.5], "y": [2.5,1.5,0.5]},
)
d = {'geometry': [Polygon([(0, 0), (0,2), (2, 2), (2, 0)]),Polygon([(2,0),(2,2),(3,2),(3,0)])]}
gdf = gpd.GeoDataFrame(d)
zonal_stats(
vectors=gdf,
raster=da.values,
affine=da.rio.transform(),
stats=["mean","count"],
all_touched=True,
)
Outputs: [{'mean': 4.5, 'count': 2}, {'mean': 6.0, 'count': 1}]
#plotting to illustrate the data
fig, ax = plt.subplots()
da.plot(ax=ax,x="x",y="y")
gdf.boundary.plot(ax=ax, color='r', lw=1)
ax.set_xlim(0,4)
ax.set_ylim(0,4)
Hi Tinka, I am experiencing similar behavior with the zonal_stats function. Have you figured out what the issue was in the meantime?
I'm a colleague of Tinka's, answering on her behalf since she no longer works at our org -- unfortunately no, we have not looked into the issue any further.
Thanks @Tinkaa - excellent bug report and I'm sorry I didn't get to it sooner (I don't use rasterstats professionally anymore so it's challenging for me to keep up with it in my spare time)
I'm able to reproduce this bug; I'll take a look and add this to the test suite!
Figured it out. Rasterio requires arrays in (row, column) order.
The rioxarray
DataArray returns its values in (column, row) order. I have no idea why it behaves this way, but it does. IMO it's a bug - every GIS program for decades has used column-row order and switching this is a very bad practice. For now, the only solution is to create your raster arrays directly or transpose
the values to get the proper order.
raster=da.values.transpose(),
With that modification, your script prints the correct values
[{'mean': 4.0, 'count': 4}, {'mean': None, 'count': 0}]