Is there a realistic limit for number of polygons
pssguy opened this issue · 8 comments
I am trying to set some hexagonal maps for areas of Canada to visualize there recent census results
I have no problem with the code with a small number of areas i.e in the GTA version below. Although there are warnings with assign_polygons() it does work almost instantaneously
However, the computer hangs at this stage when I am doing the same process with a significantly 20X set of inputs TOR
Just wondering whether there is a practical limit
NB I have just done calculate_cell_size on one option for this issue
library(cancensus)
library(sf)
library(tidyverse)
library(broom)
library(hexmapr)
options(cancensus.api_key = "CensusMapper_5c16da37f89e276603dd820db030d03a")
clean <- function(shape){
shape@data$id = rownames(shape@data)
shape.points = tidy(shape, region="id")
shape.df = inner_join(shape.points, shape@data, by="id")
}
# Greater Toronto ares census sub-divisions
GTA <- get_census(dataset='CA16', regions=list(CMA="35535"), level='CSD', geo_format = "sf") #24 obs
GTA_sp <- as(GTA, "Spatial")
GTA_shp_details <- get_shape_details(GTA_sp)
GTA_cells <- calculate_cell_size(GTA_sp, GTA_shp_details,0.03, 'hexagonal', 1)
GTAhex <- assign_polygons(GTA_sp,GTA_cells)
#There were 25 warnings (use warnings() to see them)
warnings()
#25: In spDists(originalPoints, new_points, longlat = FALSE) :
#spDists: argument longlat conflicts with CRS(x); using the value FALSE
GTAhex<- clean(GTAhex) #168 obs
ggplot(GTAhex, aes(long,
lat,
fill=Population,
group=group)) +
geom_polygon(col="white") +
geom_text(aes(V1, V2, label = substr(name,1,4)), size=5,color = "white") +
scale_fill_viridis(option="plasma") +
coord_equal() + theme_void()
# Toronto census tracts
TOR <- get_census(dataset='CA16', regions=list(CMA="3520"), vectors=c("v_CA16_2447"), level='CT', geo_format = "sf") #572
TOR_sp <- as(TOR, "Spatial")
TOR_shp_details <- get_shape_details(TOR_sp)
TOR_cells <- calculate_cell_size(TOR_sp, TOR_shp_details,0.03, 'hexagonal', 1)
## hangs here
TORhex <- assign_polygons(TOR_sp,TOR_cells)
TORhex<- clean(TORhex)
Hi @pssguy as a quick response while i look into it - I think it is working (while hanging) just taking a while. I have run it for 100's of geospatial units and it has worked previously. This implementation of the algorithm is N^4
so will slow down as the number of inputs gets larger.
@sassalley That explains it!
Bit unfortunate as this method works well with a large number of divisions.
Have you seen the parlitools package That has a hex-map function which I do not recall being so slow to apply data to
@pssguy interesting thanks for making me aware of this - do you know which function in particular? It would be good to learn from.
In this case, I don't think that the grid generation is slow, it's just the assignment algorithm that takes a while - in parlitools
are the locations already assigned?
@sassalley It's been a while since I used it . I did do a blog post on it
The locations are already assigned. There is a fixed hex map of the parliamentary constituencies (500ish) that you just join a data.frame to
I may be misunderstanding this but what I am looking to do is use the calculate_cell_size() function to create a fixed set of city, province or country hex-maps with ids for their constituent sub-divisions that I can then just link to an appropriate data frame with say life-expectancy, birth origin etc.
That doesn't seem something that should take long. I guess I'm not really understanding the purpose of assign_polygons
@pssguy nice post!
I ran the TORhex <- assign_polygons(TOR_sp,TOR_cells)
overnight and it seemed to work. Here's the grid with the assignments applied: https://nofile.io/f/qsFujqc1ZTy/TORhex.RData
I think you have it correct. As you suggested, calculate_cell_size()
will generate the empty hex-grid maps (placeholders as it were) as needed (and reasonably quickly). However, deciding where to assign the real-world geographic entities into the newly created hex-grid will take some time (especially for grids of 50+ geographic entities).
Once you have done this however, you can write the fixed set of city, province or country hex-maps with ids for their constituent sub-divisions to file and never have to do it again. (If you want to see the empty hexmap just evaluate TOR_shp_details[[2]]
).
Given our discussion, I will add a note about larger grids taking a while and potentially add a message during execution.
@sassalley
Yes when I thought about it, I realized the need to do real-world assignation as a one-off. Just looking at the parlitools documentation
that was provided by a third-party
Thanks for that file Do you know how long it took to create? Possibly you could put a progress indicator in any revisions you make to code
Not sure if it is because I am unused to handling RData but when I opened it locally I got a
SpatialPolygonsDataframe with 572 obs that looks v like the TOR_sp (though only 58kb cf 144)
I was expecting a sf/data.frame with many more rows - with similar columns as GTAhex
I think it was ~3 hours but I wasn't in a position to check (apologies). I only exported the result of TORhex <- assign_polygons(TOR_sp,TOR_cells)
hopefully once loaded I think you should be able to run clean()
on the SpatialPolygonsDataframe
from the .RData
file.
Ok. I'm good (for now at least)
Thanks for all your patience