JuliaGeometry/Meshes.jl

Performance improvements 🚀

juliohm opened this issue · 15 comments

This issue tracks our efforts to improve the performance of different algorithms and data structures implemented in the project. The public API is stabilizing and there is only one major breaking change planned before a possible v1.0.

Given the solid design that separates user-facing functions from internals, we can easily refactor the code without breaking code downstream. We kindly ask the community to add comments here whenever they encounter a performance problem.

Copying from Zulip:

One slightly breaking change I would consider is changing the vararg constructors of geometries to use static arrays (possibly mutable). This would make it much easier to write performant code that handles a lot of small geometries (e.g. segments or triangles) and avoid code such as Segment(view(v, [i, i + 1])) that currently exists (which in this case allocates the index vector plus possibly the view, afaik). I think I also have a general expectation that the arguments of a constructor are stored within the struct and not allocated, so having Triangle(a, b, c) allocate [a, b, c] is rather surprising to me and having expressions such as centroid(Segment(a, b)) and area(Triangle(a, b, c)) perform well would be very desirable imo. The only drawback that I see is that the number of points becomes fixed, which is arguably a good thing for Ngon but maybe not in the case of Chain/Bezier (though constructing them with a regular vector would still be possible of course).

comment by @mfsch

That is something we have in mind. In the past we had considered StaticVector for the list of vertices inside some of the Polytope types, but refactored and erased this. We may need to reintroduce this optimization and use NTuple directly whenever possible. Some of our Polytope types like PolyArea and Chain's subtypes require dynamic vectors due to the use cases.

Related to this, I found another potential bottleneck: segments() function for chains is slower by about a factor of 6 than the "manual" version with StaticArrays:
This small change had an improvement from 30 to 5 seconds in my case:

# @views for seg in segments(chain)
@views for (p1, p2) in zip(verts[1:n-1], verts[2:n])
        seg = Segment(SA[p1, p2])
       .... do stuff ....
end
mfsch commented

Switching Segment(view(v, [i, i + 1])) to Segment(SVector(v[i], v[i+1])) in the segments function should correspond to what @BloodWorkXGaming did, but as I wrote I think it would be more useful to make Segment(a, b) expand to Segment(SVector(a, b)) (and similarly for all n-gons), instead of changing individual uses of Segment. If you prefer not to change the varargs constructor of Ngon, then it might make sense to start improving the performance of individual use cases, but I think it would be helpful to first make that more general decision.

Another point of performance improvement might be the boundingbox implementations:
These have a lot of allocations, which shouldn't be necessary for any boundingbox calculation.
E.g. for this shape it's over 17kb of RAM

julia> poly
2 GeometrySet{2,Float64}
  └─PolyArea(50-Ring)
  └─PolyArea(356-Ring, 302-Ring)

julia> @time boundingbox(poly)
  0.000021 seconds (21 allocations: 17.172 KiB)
Box{2, Float64}(Point(0.604086399, -13.3619730947), Point(149.4063610679, 13.8699076501))

This isn't really a priority as it is still fast, but could be a problem when running in a tight loop.

@BloodWorkXGaming regarding the boundingbox allocations, did you identify the harmful lines? Feel free to submit PRs for review. If it is also related to the current internal representation of vertices in Polytope types, we will fix it in a more systematic way.

Ye I found some harmful lines, it is rather related to the MVectors and collect operations in the functions. Will try to open a PR after the weekend with some improvements :)

Update: we already implemented a series of changes to the vertices representation inside Polytope geometries. They are NTuple of Point now, which means, more compile-time optimizations.

Update: refactored boundingbox and hull methods to allocate zero memory whenever possible.

@BloodWorkXGaming all cases covered now from your previous PR.

Thanks! Sorry for not refactoring the PR, was very low on time the past weeks

Update: pointwise transforms no longer allocate intermediate vectors. Rotate, Translate and Stretch are examples.

We need to improve the performance of the FIST triangulation method. I think it is already type stable, we need algorithmic improvements and multiple threads now.