RobertLeahy/KinFu

Get rid of branch in sum by doing linearized sum when we get to an odd number

Closed this issue · 5 comments

Get rid of branch in sum by doing linearized sum when we get to an odd number

The overhead to running parallel sum kernel might outweigh the number of elements we need to sum at some point. Look into this.

Before:

Correspondences took 11510212 nanoseconds
Sum took 14743760 nanoseconds
Correspondences took 2269578 nanoseconds
Sum took 14416479 nanoseconds
Correspondences took 2208341 nanoseconds
Sum took 14624802 nanoseconds
Correspondences took 2226800 nanoseconds
Sum took 14507016 nanoseconds
Correspondences took 2175818 nanoseconds
Sum took 14719734 nanoseconds
Correspondences took 2187245 nanoseconds
Sum took 14613375 nanoseconds
Correspondences took 2214494 nanoseconds
Sum took 14877954 nanoseconds
Correspondences took 2216838 nanoseconds
Sum took 14635350 nanoseconds
Correspondences took 3013212 nanoseconds
Sum took 14844552 nanoseconds
Correspondences took 2182264 nanoseconds
Sum took 14583489 nanoseconds
Correspondences took 2170251 nanoseconds
Sum took 14427027 nanoseconds
Correspondences took 2184315 nanoseconds
Sum took 14485334 nanoseconds
Correspondences took 2163805 nanoseconds
Sum took 14493831 nanoseconds
Correspondences took 2196328 nanoseconds
Sum took 14413549 nanoseconds
Correspondences took 2177283 nanoseconds
Sum took 14241265 nanoseconds

Before:

PS C:\Users\rleahy\Documents\C++\SENG499> bin/client.exe --dataset=C:/Users/rleahy/Downloads/heads/heads/seq-01/seq-01 --max-frames=3
Depth frame took 14ms
Measurement took 12ms
Pose estimation took 0ms
Updating reconstruction took 5ms
Surface prediction took 25ms
Frame took 84ms
Finished frame 1 / 3
Depth frame took 13ms
Measurement took 0ms
Pose estimation took 276ms
Updating reconstruction took 0ms
Surface prediction took 11ms
Frame took 323ms
Finished frame 2 / 3
Depth frame took 12ms
Measurement took 0ms
Pose estimation took 262ms
Updating reconstruction took 0ms
Surface prediction took 11ms
Frame took 308ms
Finished frame 3 / 3
Frames: 3
Average time per frame: 238ms

For some reason removing the branch here and everything below it seems to add ~60ms to the running time of the pose estimation pipeline block.

I do not understand this.

If I remove the branch and everything below it, and inline vectoradd6 and matrixadd6 manually it bloats the runtime of the pose estimation block from ~280ms to ~960ms somehow...