mapbox/vtquery

optimizations (now and future)

Closed this issue ยท 4 comments

This is a running list of optimizations to discuss/implement

  • dedupe based on ID in the feature if it exists
  • save a vector of data_views from vtzero once
  • use vtzero to loop through and run geometry against closest_point - add data_view to vector for usage later, and keep moving
  • convert "radius" to tile coordinates
  • work within integers instead of doubles for geometry.hpp
  • determine where to load tile buffers - probably in javascript and pass an array of buffers in?
  • get feature properties from data_views once loop is finished (currently saved as variants until part of the results object)
  • not returning specific geometry types (i.e. "only give me points")
  • how to handle "radius" across tiles? keeping the origin relative to the current tile so if you have a tile that is outside of the bounds of the origin point, the origin value increases (or goes negative) this will give distances as real numbers, rather than interpolating based on a relative origin
  • "sort during" instead of "sort after" architecture
  • bypass closest point when radius=0 and only use boost::geometry::within AND only work with polygons (open question - how to handle exact intersections of query point along lines/boundaries?)
  • non-copyable ResultsObject

Working on the bench branch febd83a and testing against node-mapnik vt.query to get a sense of how far we have to go. Still have a ways to go! ๐Ÿ˜„ some preliminary results:

Mapnik

Benchmark speed: 754 runs/s (runs:1000 ms:1327 )
Benchmark iterations: 1000 concurrency: 2

Vtquery

Benchmark speed: 307 runs/s (runs:1000 ms:3261 )
Benchmark iterations: 1000 concurrency: 2

Vtquery (no sort)

Benchmark speed: 506 runs/s (runs:1000 ms:1978 )
Benchmark iterations: 1000 concurrency: 2

So, mapnik vt is still faster but we haven't started optimizing yet. Just removing sorting std::sort speeds up vtquery significantly - so I'd say this is a good place to start.

@artemp made a great point today that vtquery (with closest_point) is actually returning proper information from polygons, which is not the case in mapnik vt.query. Therefore we'll never be comparing ๐ŸŽ to ๐ŸŽ

Adding:

  • What does profiling say the biggest bottleneck is? Is the biggest bottleneck something that is extra functionality in vtzero (like closest_point) or is it something else which is the same functionality as in node-mapnik, just slower?
  • I noticed we are not currently ๐ŸŽ to ๐ŸŽ in the other direction (e.g. vtzero doing less work than node-mapnik) because #18 is not done. Before more comparisons I think it is important to finish sending feature properties back to JS land, since that could be expensive. And because it is expensive, see Be more lazy at #30 (perhaps that is the same idea you had in mind above with use vtzero to loop through and run geometry against closest_point - add data_view to vector for usage later, and keep moving?)

Closing - these have been implemented in #42 and #39