perrygeo/python-rasterstats

Proposal for a third rasterization strategy

bet-bit opened this issue · 4 comments

Hello,

Reading the documentation about Rasterization strategy I see there are two current possible strategies:

  • default : pixel is included if its centroid is within the geometry)
  • all_touched : pixel is included if it intersects the geometry in any way
    I was wondering if implementing a third possibility, coming naturally afther those two, would be useful:
  • only_within (or somehing like that) : pixel is included only if it is contained by the geometry.

The use case I had in mind was, when working with satellite data combined with vector layer of polygons, being able to assure that all the pixels you include are totally within the polygon, in order to discard noisy pixels with information coming from other regions outside the geometry.

I guess a workaround is to make a buffer of \sqrt(2*pixel_size) and use the default stategy, but this extra processing might take longer than having it already implemented.

Hope that idea can be useful. Or maybe not that much? in that case I'll be eager to know why !

Thanks for the work you put on that,

Great idea, it would be very useful. I would love to support a more flexible mechanisms for rasterization, only_within being an obvious choice for certain types of analysis. I'd also like to support a threshold (e.g. > 66% covered).

But rasterstats relies entirely on the GDAL library to provide the rasterization algorithm so pragmatically we're stuck with the two strategies it provides. The best option to implement these ideas would be to get them into GDAL proper. How's your C++ :-)

The buffer strategy is a viable workaround, and the only way to accomplish this today.

Finally, we could bypass GDAL altogether and write a custom rasterization algorithm. We could write a native shared library that could perform rasterization exactly as GDAL currently does with a similar interface and equivalent performance. Then build python bindings to it. This complicates distribution (it's no longer just a python module) but could be an option.

Hope that answers your questions @bet-bit - it's doable but a heavy lift

Approach from #136 could be leveraged for threshold selection (with 100% being the same as using only_within)

Finally, we could bypass GDAL altogether and write a custom rasterization algorithm (...) Then build python bindings to it.

Some discussion at isciences/exactextract#34

Hi, and thanks for the answers!
I agree that the best solution would be to put all that into GDAL and from then geit it into rastertats... unfortunately I am not that good with programming, and even less with C++. We'll see if we can make that happen at some point in the future though!