Improve runtime for small areas
Opened this issue · 8 comments
Planetiler takes ~30 seconds to run even for the smallest areas (like andorra from geofabrik). Let's see if there is any way to improve that. Here's a summary of runtime over andorra:
0:00:33 INF - overall 33s cpu:1m12s gc:3s avg:2.2
0:00:33 INF - lake_centerlines 3s cpu:12s gc:1s avg:4.4
0:00:33 INF - read 1x(35% 0.9s done:2s)
0:00:33 INF - process 9x(1% 0s wait:1s done:2s)
0:00:33 INF - write 1x(0% 0s wait:1s done:2s)
0:00:33 INF - water_polygons 12s cpu:17s avg:1.4
0:00:33 INF - read 1x(94% 12s)
0:00:33 INF - process 9x(0% 0s wait:12s)
0:00:33 INF - write 1x(0% 0s wait:12s)
0:00:33 INF - natural_earth 11s cpu:14s avg:1.3
0:00:33 INF - read 1x(66% 7s done:4s)
0:00:33 INF - process 9x(2% 0.2s wait:7s done:4s)
0:00:33 INF - write 1x(0% 0s wait:8s done:4s)
0:00:33 INF - osm_pass1 0.4s cpu:2s avg:3.7
0:00:33 INF - osm_pass2 1s cpu:5s avg:5
0:00:33 INF - read 1x(0% 0s)
0:00:33 INF - process 9x(31% 0.3s)
0:00:33 INF - write 1x(3% 0s wait:1s)
0:00:33 INF - boundaries 0s cpu:0.1s avg:2.9
0:00:33 INF - sort 0.1s cpu:0.7s avg:7.2
0:00:33 INF - archive 0.5s cpu:3s avg:5.6
The biggest issues are natural earth and water polygons since planetiler has to deserialize every feature for the whole planet.
One idea would be to switch natural earth to read the geopackage source, and use the built-in spatial index to limit what we read to only what's inside the bounding box.
I'm not sure if we could do something similar with water polygons since they are just a zipped shapefile with a shp and shx file but no sbn or sbx. If we convert it to a different format we could add an index, but that complicates things quite a bit since we can't just download directly from the source.
cc/ @bdon
At the most extreme we could define a ReadableTileArchive
as another input type that is passed directly as tiled features, without touching the FeatureCollector API; the OSM or NE-derived ocean is going to be exactly the same for every planetiler output modulo tags/buffer sizes. That would make water polygons cost effectively nothing.
Otherwise we might be able to read the Shapefile index if one is included for water polygons, or migrate to another indexed format for it (Geopackage, FGB?)
Another hybrid option might be to compute a spatial index the first time we read a file and use it to speed up subsequent reads?
Or ask the maintainers of the water polygons source if they'd be up for distributing in geopackage format/adding a spatial index.
Another low hanging fruit would be to keep the unzipped file contents around between runs
Context on Geopackage etc output from osmcoastline: osmcode/osmcoastline#35 (comment)
For the Natural Earth / geopackage case, we can also have profiles declare the limited set of tables that they're interested in and skip reading features that won't be processed.