jbaileyh/geogrid

Is there a realistic limit for number of polygons

pssguy opened this issue · 8 comments

I am trying to set some hexagonal maps for areas of Canada to visualize there recent census results

I have no problem with the code with a small number of areas i.e in the GTA version below. Although there are warnings with assign_polygons() it does work almost instantaneously
However, the computer hangs at this stage when I am doing the same process with a significantly 20X set of inputs TOR
Just wondering whether there is a practical limit

NB I have just done calculate_cell_size on one option for this issue

library(cancensus)
library(sf)
library(tidyverse)
library(broom) 
library(hexmapr)


options(cancensus.api_key = "CensusMapper_5c16da37f89e276603dd820db030d03a")

clean <- function(shape){
  shape@data$id = rownames(shape@data) 
  shape.points = tidy(shape, region="id") 
  shape.df = inner_join(shape.points, shape@data, by="id")
}

# Greater Toronto ares census sub-divisions

GTA <- get_census(dataset='CA16', regions=list(CMA="35535"),  level='CSD', geo_format = "sf") #24 obs

GTA_sp <- as(GTA, "Spatial")

GTA_shp_details <- get_shape_details(GTA_sp)


GTA_cells <-  calculate_cell_size(GTA_sp, GTA_shp_details,0.03, 'hexagonal', 1)


GTAhex <- assign_polygons(GTA_sp,GTA_cells)
#There were 25 warnings (use warnings() to see them)
warnings()
#25: In spDists(originalPoints, new_points, longlat = FALSE) :
#spDists: argument longlat conflicts with CRS(x); using the value FALSE

GTAhex<- clean(GTAhex) #168 obs

ggplot(GTAhex, aes(long,
                      lat,
                      fill=Population,
                      group=group)) +
  geom_polygon(col="white") +
  geom_text(aes(V1, V2, label = substr(name,1,4)), size=5,color = "white") +
  scale_fill_viridis(option="plasma") +
  coord_equal() + theme_void()

# Toronto census tracts

TOR <- get_census(dataset='CA16', regions=list(CMA="3520"), vectors=c("v_CA16_2447"), level='CT', geo_format = "sf") #572

TOR_sp <- as(TOR, "Spatial")

TOR_shp_details <- get_shape_details(TOR_sp)


TOR_cells <-  calculate_cell_size(TOR_sp, TOR_shp_details,0.03, 'hexagonal', 1)

## hangs here
TORhex <- assign_polygons(TOR_sp,TOR_cells)
TORhex<- clean(TORhex)

Hi @pssguy as a quick response while i look into it - I think it is working (while hanging) just taking a while. I have run it for 100's of geospatial units and it has worked previously. This implementation of the algorithm is N^4 so will slow down as the number of inputs gets larger.

@sassalley That explains it!
Bit unfortunate as this method works well with a large number of divisions.
Have you seen the parlitools package That has a hex-map function which I do not recall being so slow to apply data to

@pssguy interesting thanks for making me aware of this - do you know which function in particular? It would be good to learn from.

In this case, I don't think that the grid generation is slow, it's just the assignment algorithm that takes a while - in parlitools are the locations already assigned?

@sassalley It's been a while since I used it . I did do a blog post on it

The locations are already assigned. There is a fixed hex map of the parliamentary constituencies (500ish) that you just join a data.frame to

I may be misunderstanding this but what I am looking to do is use the calculate_cell_size() function to create a fixed set of city, province or country hex-maps with ids for their constituent sub-divisions that I can then just link to an appropriate data frame with say life-expectancy, birth origin etc.
That doesn't seem something that should take long. I guess I'm not really understanding the purpose of assign_polygons

@pssguy nice post!

I ran the TORhex <- assign_polygons(TOR_sp,TOR_cells) overnight and it seemed to work. Here's the grid with the assignments applied: https://nofile.io/f/qsFujqc1ZTy/TORhex.RData

I think you have it correct. As you suggested, calculate_cell_size() will generate the empty hex-grid maps (placeholders as it were) as needed (and reasonably quickly). However, deciding where to assign the real-world geographic entities into the newly created hex-grid will take some time (especially for grids of 50+ geographic entities).

Once you have done this however, you can write the fixed set of city, province or country hex-maps with ids for their constituent sub-divisions to file and never have to do it again. (If you want to see the empty hexmap just evaluate TOR_shp_details[[2]]).

Given our discussion, I will add a note about larger grids taking a while and potentially add a message during execution.

@sassalley
Yes when I thought about it, I realized the need to do real-world assignation as a one-off. Just looking at the parlitools documentation that was provided by a third-party

Thanks for that file Do you know how long it took to create? Possibly you could put a progress indicator in any revisions you make to code

Not sure if it is because I am unused to handling RData but when I opened it locally I got a
SpatialPolygonsDataframe with 572 obs that looks v like the TOR_sp (though only 58kb cf 144)
I was expecting a sf/data.frame with many more rows - with similar columns as GTAhex

I think it was ~3 hours but I wasn't in a position to check (apologies). I only exported the result of TORhex <- assign_polygons(TOR_sp,TOR_cells) hopefully once loaded I think you should be able to run clean() on the SpatialPolygonsDataframe from the .RData file.

Ok. I'm good (for now at least)
Thanks for all your patience