geocompx/geocompr

Section on cleaning geometries in the geometry chapter

Robinlovelace opened this issue ยท 11 comments

Currently this is the only mention of cleaning geometries in the book I believe:

https://github.com/Robinlovelace/geocompr/blob/3579906af69949dbe47fec783b62ad530018ce14/10-gis.Rmd#L255-L263

As @defuneste has flagged this, and anecdotal evidence suggests it's a common issue, I suggest it will be a useful section. Thoughts on best tools for the job? Options include (we can check off which to test/mention):

Interested in which works, people look to this for recommendations so if we cover stuff we should ensure it's tested and known to work! I've had so-so experience with st_make_valid() but it's in {sf} so should be covered first, then {sptatstat} tools as they are well maintained. pprepr is not on CRAN and seems to be unmaintained.

I am just trying to use some stuff in {spastat} so I am slowly reading doc/book/codes. Apparently {polyclip} (https://github.com/baddstats/polyclip) is used to do some "cleaning". It is use here (https://github.com/spatstat/spatstat.geom/blob/d90441de5ce18aeab1767d11d4da3e3914e49bc7/R/window.R#L230-L240).

This is in the owin class and it is probably use to avoid self-intersecting polygon.

I will have to test it a bit to get a better understanding ...

I have adapted this web page: http://s3.cleverelephant.ca/invalid.html with a bunch of topological errors (it is from @pramsey and related blog post: https://www.crunchydata.com/blog/waiting-for-postgis-3.2-st_makevalid).

The script is here: https://github.com/defuneste/utile_comme_du_pq/blob/master/erreur_topo.R
it has a lot of dead codes and should be cleaned a bit soon. I could not understand/reproduce all the errors but I think it is a very nice setup to test some algorithm that "clean geometries". On the negative side it only include one or two geometries per error.

Stuff that can be improved (for later):

  • Try to organize errors in category, ie : polygon, ppolygon + hole(s), multipolygon
  • Display vertexes

The twiiter post helped!

  • Ty frazier (@syntheticpops) mentioned also terra::makeValid() and a terminal approach with ogr2ogr โ€” skipfailures x.shp y.shp
  • @mdsumner mentionned sfdct::ct_triangulate() followed with group_by this tweet is also very helpful to start understanding a bit more the various approach of this problem
  • Etienne Racine (@tiennebr) also bring the classic buffer at 0m that we should add to the list
  • New Geographer mention v.clean in grass that we already have in chapter 10

My shiny app start to look not too bad. I will add more options and see how I can host it somewhere so it can be accessible to other.

edit: few typos

This is awesome @defuneste, keep the ideas coming. Hope to implement some of them in time for the 2nd edition!

I have tested {prepr} (with one p I think!) and {polyclip} on the small shiny app here (https://github.com/defuneste/utile_comme_du_pq/tree/master/topo_errors). We get very different results depending of the errors, algorithms/implementation. Even if it is not perfect (we could add some function args in the shiny apps), I will try to figure a way later to publish it. it will probably take me too much time to host it but before I can use the free shiny hosting. What do you think?

How deep do you want to go in geocompr?

I think the minimum should include the two functions from {terra} and {sf} and the classic "hack" of st_buffer(x, 0). Polyclip is probably the least interesting even if it is quick intuitive to understand how it works.

I will need to read the paper on "constrained triangulation" to understand {prepr} but result look goods.

Next should be for me to read a bit more on how terra::makeValid() and sf::st_make_valid works().

Look forward to giving this a spin, over the weekend maybe ๐Ÿ™

Well I am hosting it! : https://www.branchtwigleaf.com/shinyapps/make-valid-geom/

if it useful I would totally move it to some geocompsomething because I think the value is mostly pedagogical

What I have learn from it:

  • I was surprised at the diversity of results in some cases

  • even if {terra} and {sf} both use geos they sometimes provide slightly different results, my guess is different choices of implementation. I have no idea which is correct (if one is) but we, the geo communities, should find a way of explaining it.

  • st_buffer(geom, 0) is great but sometimes produces weird result with multi polygon or polygon with holes

  • polyclip should probably not be used outside of case were you need a windows (kind of similar problem than a buffer because polyclip is a clipping tool with a bigger polygon). Troubles could also come from my implementation as you has a lot of format conversions (sf -> polyclip -> sf)

  • st_repair not much to say, it seems good, it is a bit hard on the dependencies sides so not for a basic user. I have still not read the paper

Edit: updating the link!

GRASS documentation about V.clean is great and I should think of a way to add it : https://grass.osgeo.org/grass82/manuals/v.clean.html

Probably you want the "structure" option for the make valid parameters. That should give a result that is "much like buffer(0)" without the failure modes.

Hi @Robinlovelace do we have dead line on this?

I will probably need some time to understand a bit more GRASS before adding it. It can be mention in chapter two (explaining the concept of validity maybe in the same place than inner ring / holes ?) or later in chapter 5 but I do not see where.

The link of @pramsey was a good help to understand the GEOS level (I will still have to try some cases and "draw" them). We still need to get how {terra}/{sf} use it. It is hard because not everyone will be at GEOS 3.10.

Hey @defuneste, no hard deadline but sooner would be getter.