A Mojo🔥 project created for the ModCon23 contest. It was Finalist #6.
Calculates the spatial envelope, and explores the performance of Python, NumPy, and Mojo. The Mojo final implementation is 20X to 600X faster than Python, and 1.3X to 4X faster than NumPy.
Calculating an envelope is a fundamental part of spatial analysis. The envelope (aka: bounds, bbox, mbr) is usually defined by an xmin, ymin, xmax, and ymax representing the minimum and maximum x (longitude) and y (latitude) coordinates that encompass the bounded feature(s).
Figure attribution: QGIS documentation: Fig. 27.53 Black lines represent the bounding boxes of each polygon feature.
- naïve Python
- naïve Mojo
- Python using NumPy (well-optimized C code)
- Mojo optimized with vectorization and loop unrolling, single-threaded (mojo optimized "a")
- Mojo optimized with parallelization, vectorization and loop unrolling. (mojo optimized "b")
The Shapely package wraps the GEOS library, another well-optimized C/C++ codebase. Shapely caches the envelope upon geometry creation, so it was not feasible to benchmark the envelope calculations separately from geometry constructors. For that reason, Shapely was not included in the benchmarks.
Test system: mojo 0.5.0
on Apple M2, 24GB RAM. Data type: float32
.
-
Mojo optimized "a" is the best overall performer. However for large feature spaces (millions of points) we can get at least an additional 25% speedup by using one thread per dimension, as shown in Mojo optimized "b".
-
Even more performance optimizations are possibly left on the table, such as auto-tuning, stack allocation, and tiled/striped memory access. A fusion of Mojo optimized "a" and "b" could offer the best performance across all feature sizes.
-
In addition to being performance winners, the Mojo variants are parameterized by the number of dimensions (
dims
) and by data type (dtype
). In other words, the same generic code can run, for example,int16
,float16
,float64
, and with 3, 4 or more dimensions. In GIS systems the number of dimensions is sometimes referred to as XY, XYZ, or XYZM, where Z is "height", and M is "measure".
$ mojo mojo_impl/optimized_a.mojo 100
float32 100
---------------------
Benchmark Report (s)
---------------------
Mean: 6.5356175103213615e-07
Total: 0.75083200000000005
Iters: 1148831
Warmup Mean: 9.9999999999999995e-07
Warmup Total: 1.9999999999999999e-06
Warmup Iters: 2
Fastest Mean: 6.4460000000000004e-07
Slowest Mean: 7.9999999999999996e-07
ns: 653.56175103213616
microsecs: 0.6535617510321361
ms: 0.0006535617510321361
s: 6.5356175103213615e-07