perrygeo/python-rasterstats

zonal_stats call returns NaN for some polygons

danbroman opened this issue · 2 comments

I've tried and failed to work out an explanation for this error and hope for some help. raster_stats returns NaN for some polygons for all statistics. I've tried using geojson and shapefile polygons, changed the map projection from a projected coordinate system (EPSG:5070) to WGS84 (EPSG:4326), and tried several versions of Python. My production environment is -

  • Python 3.8
  • numpy 1.19.4
  • gdal 3.1.4
  • 16GB system memory

I have also tried using Python 3.6 and the results are the same.

raster_stats returns NaN for 12 polygons. I've used the Zonal statistics function packaged with QGIS and the returned values are identical for all of the polygons except that QGIS returns values for all and does not return any NaN.

The polygon and raster to reproduce this error can be found here.

import rasterstats
from rasterstats import zonal_stats
tif = 'snodas_swe_20040101_Klamath_HRU_EPSG5070_english.tif'
basin_poly_path = 'Klamath_HRU_EPSG5070.shp'
tif_stats = zonal_stats(basin_poly_path, tif, stats=['min', 'max', 'median', 'mean'])

If you look at those 12 polygons, I would guess they are too small or unfortunately shaped so they don't intersect the centroid of any raster pixel (the default rasterization strategy).

Adding all_touched=True to your zonal_stats call will use an alternative strategy of including all overlapping pixels. See https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy

Thanks for the quick reply! I did some checking and you're correct in what's happening. It also seems that by default the QGIS plugin uses the all_touched=False option equivalent and must flag polygons that return NaN then fills in those values using the equivalent of all_touched=True. Good to know for future reference. Thank you again for your help and the quick response