ethz-asl/aslam_optimizer

Speed improvements

Closed this issue · 12 comments

callgrind

Memory management takes up to ~30% of computation time.

Functions to improve:

  • evaluateImplementation() should not make a return value copy each time

_Idea_: Change error term evaluation to separated computeValue() and Eigen::MatrixXd& getValue() methods and parse the expression tree in a preprocessing stage to trigger computeValue() in the correct order.

Seems like the return by value overhead is negligible based on profiler statistics and not worth optimizing.

Yes, I just realized why it didn't help reducing the memory management: most of these return matrices are statically sized Eigen matrices -> they are allocated purely on the stack! So no malloc involved here.

The splitting into compute and get as described above might still make sense because it reduces duplicate computations (we could easily find out by counting computations in each node).

The Jacobian evaluation is much harder to realize with statically size matrices.

It is still unclear to me how much of the Jacobian evaluation (the bulk work) actually goes into value evaluation. Probably not that much.

I made the CacheExpressions work with GenericMatrixExpressions, but still no runtime benefit for the probabilistic planner. Have to inspect the profiler statistics...

Which expressions did you cache? The velocity square was a ScalarExpression, wasn't it?

Position and velocity vectors (though I don't think velocities are actually used in non squared form anywhere). I hoped that the caching of positions would decrease the overhead from addJacImpl from the splines.
I also ran the CacheExpressions with isCacheValid = true calls commented out so that the evaluation is performed always. Seems like the code in the CacheExpressions causes an additional overhead of 20% compared to no CacheExpressions. The code is pushed to #127 in the meantime.

Introduced matrix stack to circumvent memory allocation in #137

Very important to consider in order to prevent unnecessary memory allocation for temporaries: http://eigen.tuxfamily.org/dox/TopicWritingEfficientProductExpression.html

TODO: optimize v^T * v

@HannesSommer do we have an easy possibility to show a png and a txt output generated by a bash script in Jenkins?

Thinking of running callgrind on the profiling exe automatically...

Thinking of running callgrind on the profiling exe automatically...

Did a lot of speed tweaks and gained around 100% speedup.