Turoad/CLRNet

using branch deploy, c++ version on xavier become worse

Opened this issue · 6 comments

I am using branch deploy, I get the .so of the unsupport ops, and the python scripts work fine. Nearly the same.
I moved to xaiver(aarch64), and generate the onnx file and .so file, and then use trtexec plugin to get the engine file.
The result become worse.
There are two difference, first, the trt version on xaiver is 7.1 while in deploy it is 7.2.
Second, I am doubting the preprocess but have no progress.
The process is about crop->resize->np.float32-> /255.0 -> add a dim and dims to 3201
maybe there is a little difference between the different method(cv transform imgaug) and the step order, I check the input find the difference is really small.
Really need some help!

the performance show difference on the image, there are one or two lanes missed. And lanes on the left are worse than lanes on the right. The distal side has a trend close to the middle.

@mengxia1994
Trt 8.5.x works fine for me, the preprocess is same as yours
CROP --> RESIZE --> 1/255 --> NCHW. None other transforms is applied.
i recommend you check the model output first, they should be the same or atleast 99% the same (ONNX and TRT sometime produce difference results if you mis-choose the type (float, double,..).

Thank you~ Yes, the model output are at least 99% the same between pth, onnx and trt on a x86_64 training machine. But I have to run it on xavier and consider the total system stability it will take months to upgrade jetpack. The model output are quite different on xavier.
According to current clues, this should because of some ops difference between trt7.1 and newer versions

did you make it work with trt 7.1.x? im trying to implement it on xavier or nano. if it didnt work in the end, prolly ill need to make it work on trt 7.1.x so please let me know