/modcon23-contest

(mojo) Spatial envelope optimization and benchmarks. Finalist in ModCon23 contest.

Primary LanguageMojoMIT LicenseMIT

Spatial envelope optimization and benchmarks

Run Tests

A Mojo🔥 project created for the ModCon23 contest. It was Finalist #6.

Calculates the spatial envelope, and explores the performance of Python, NumPy, and Mojo. The Mojo final implementation is 20X to 600X faster than Python, and 1.3X to 4X faster than NumPy.

Envelope

Calculating an envelope is a fundamental part of spatial analysis. The envelope (aka: bounds, bbox, mbr) is usually defined by an xmin, ymin, xmax, and ymax representing the minimum and maximum x (longitude) and y (latitude) coordinates that encompass the bounded feature(s).

bounding box

Figure attribution: QGIS documentation: Fig. 27.53 Black lines represent the bounding boxes of each polygon feature.

Variants benchmarked

Variants also considered

The Shapely package wraps the GEOS library, another well-optimized C/C++ codebase. Shapely caches the envelope upon geometry creation, so it was not feasible to benchmark the envelope calculations separately from geometry constructors. For that reason, Shapely was not included in the benchmarks.

All benchmarks

Test system: mojo 0.5.0 on Apple M2, 24GB RAM. Data type: float32.

overall benchmarks raw results spreadsheet

Chart of optimized variants only

optimized benchmarks

Key Findings

  1. Mojo optimized "a" is the best overall performer. However for large feature spaces (millions of points) we can get at least an additional 25% speedup by using one thread per dimension, as shown in Mojo optimized "b".

  2. Even more performance optimizations are possibly left on the table, such as auto-tuning, stack allocation, and tiled/striped memory access. A fusion of Mojo optimized "a" and "b" could offer the best performance across all feature sizes.

  3. In addition to being performance winners, the Mojo variants are parameterized by the number of dimensions (dims) and by data type (dtype). In other words, the same generic code can run, for example, int16, float16, float64, and with 3, 4 or more dimensions. In GIS systems the number of dimensions is sometimes referred to as XY, XYZ, or XYZM, where Z is "height", and M is "measure".

Example output including Mojo's benchmark report

$ mojo mojo_impl/optimized_a.mojo 100
float32 100
---------------------
Benchmark Report (s)
---------------------
Mean: 6.5356175103213615e-07
Total: 0.75083200000000005
Iters: 1148831
Warmup Mean: 9.9999999999999995e-07
Warmup Total: 1.9999999999999999e-06
Warmup Iters: 2
Fastest Mean: 6.4460000000000004e-07
Slowest Mean: 7.9999999999999996e-07

ns: 653.56175103213616
microsecs: 0.6535617510321361
ms: 0.0006535617510321361
s: 6.5356175103213615e-07