DSVT-TRT deployment dynamic & static shaping
d33dler opened this issue · 1 comments
Experimenting on an RTX 2060 and deploying the DSVT module only shows mediocre improvement versus python (~.05 % speedup) despite doing input statistic study (custom data) and narrowly adjusting the dynamic shapes where optShape is located very close to usual inputs and the min-maxShape bounds are also close. Moreover, I suspect the speed-up you observe is merely due to fp16 conversion and is very much hardware dependent, since as the trtexec debug log says :
[09/19/2023-19:53:14] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are:
[09/19/2023-19:53:14] [W] [TRT] voxel_number
[09/19/2023-19:53:14] [W] [TRT] set_number_shift_0
[09/19/2023-19:53:14] [W] [TRT] set_number_shift_1
Did you try static shaping?
Assuming the pointcloud scene maintains almost the same shape (static recording), I assume - in order to obtain consistent values after voxelisation we need a "point" mask for the pointcloud input (points
) to pad empty space across all borders to maintain the same number of voxels (correct me if i'm wrong).
What would be the steps to achieve this?
Hello! I've experimented with a static shape configuration, and found it to be faster than the dynamic setup.