/presto_spatial_join_blog

Profiling code for spatial joins.

Primary LanguageRustISC LicenseISC

Spatial Join Performance

This code compares performing a spatial join of a population dataset with polygons of counties in the USA in four ways:

  1. A naive double loop
  2. Using a cheap county envelope pre-check
  3. Checking a state's envelope first, then checking the state's counties.
  4. Using an RTree of the county polygons.

On my machine, (1) took 652.1 seconds, (2) took 13.8s, (3) took 3.4s, and (4) took 1.3s.

These results are not deeply rigorous, nor are the algorithms particularly optimized. Additionally, only the outer shells of polygons are used -- holes are ignored completely. They are only intended to get order-of-magnitude results.

To run the performance measurements

  1. Install Rust
  2. Clone this repo.
  3. In the root directory, run cargo build --release && target/release/presto_spatial_join_blog

The brute force calculation can take over 10 minutes, so watch a video from Lessons from the Screenplay.

Acknowledgements

Population centers come from Facebook's Population Density Maps.

County and State geojson files come from Eric Celeste, who sourced the data from the US Census Bureau.