r-spatial/stars

Cells counted only once when extracting into overlapping geometries

turbanisch opened this issue · 3 comments

I'm preparing to teach a short course about geospatial data in R for economists. We will be using sf for the introduction to vector data and most students are familiar with the Tidyverse. For that reason, I am inclined to use stars rather than terra or raster.

When it comes to vector-raster interactions, collapsing and extracting raster data based on vector geometries is probably the function we use most. However, I found out that cells are extracted only once when geometries overlap using aggregate() or st_extract() (which reverts to aggregate() in the case of polygons, as I understand).

Even though the article linked above was posted in March 2021, the behavior doesn't seem to have changed. Is there another proposed workflow for this application? I know that st_join() and exactextractr can be used instead but am still curious to hear about the use case for st_extract()/aggregate(). Overlapping geometries are not an uncommon scenario for us, for example when we create buffers around nearby points.

edzer commented

Yes, I can see this case; the thinking behind aggregate is the implementation of stats::aggregate, and the equivalence in SQL. For st_extract it is a different story, basically lack of time / priority.

edzer commented

After looking at the code, I think st_extract does the right thing for overlapping polygons: it calls aggregate for each polygon. This may be highly inefficient, but should do the job.

Ah, I see, thank you so much! I didn't realize that st_extract() calls aggregate() on each individual polygon but that seems to be indeed the case. Perhaps a short remark in the function description that it can be used not only on point geometries could be helpful for other users? Anyways, I am looking forward to using the stars package in our course. For the didactical exercises in our course, performance will hopefully not be too much of an issue.