Detection duplicates with fp16 on Jetson Nano (TensorRT v8.2.1.8)
IoannisKaragiannis opened this issue · 3 comments
Hey there Linamo1214,
First of all, great job with the trt. I have one question though. I have proceeded with the conversion like this.
On my laptop, running Ubuntu 22.04, without any NVIDIA GPU, I have created a virtual environment with python3.10, I installed all the essential packages for the yolov7
repo, and I just proceeded with the .pt
to .onnx
conversion like this
(yolov7)$ python3.10 export.py --weights my_models/yolov7-tiny.pt --grid --simplify --topk-all 200 --iou-thres 0.5 --conf-thres 0.4 --img-size 416 416
I on purpose did not set the --end2end
flag in order to use it later directly on the trt conversion.
Then I moved on my Jetson Nano. I have my own tiny project, where I confirmed that the yolov7-tiny-416.onnx
model from the above conversion works fine with an average inference time of 99.5 ms. Then I downloaded your repo on my Jetson Nano, I created a dedicated virtual environment with python3.6 (to be compatible with tensorrt which was built with python3.6 also), I symbolically linked the natively built TensorRT like this:
(trt)$ ln -s /usr/lib/python3.6/dist-packages/tensorrt/ my_venvs/trt/lib/python3.6/site-packages/tensorrt
and then I proceeded with the .onnx
to .trt
conversion like this:
(trt)$ python3.6 export.py -o my_models/yolov7-tiny-416.onnx -e my_models/yolov7-tiny-416-fp16.trt -w 2 --iou_thres 0.5 --conf_thres 0.4 --end2end -p fp16 --max_det 200
The reason I set the maximum workspace size to 2GB was because I was getting the following error:
Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output
and the reason I decided to use the -w
flag in the first place is because I was getting the following error:
File "export.py", line 308, in <module>
main(args)
File "export.py", line 266, in main
builder = EngineBuilder(args.verbose, args.workspace)
File "export.py", line 109, in __init__
self.config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace * (2 ** 30))
AttributeError: 'tensorrt.tensorrt.IBuilderConfig' object has no attribute 'set_memory_pool_limit'
So, basically, to overcome this, I had to apply the following change in your export.py
. I guess I had to do this because of the old tensorRT version of the Jetson Nano.
# original
self.builder = trt.Builder(self.trt_logger)
self.config = self.builder.create_builder_config()
self.config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace * (2 ** 30))
# self.config.max_workspace_size = workspace * (2 ** 30) # Deprecation
# updated
self.builder = trt.Builder(self.trt_logger)
self.config = self.builder.create_builder_config()
# self.config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace * (2 ** 30))
self.config.max_workspace_size = workspace * (2 ** 30) # Deprecation
Then, based on you trt.py
, I load the trt model in my application on the Jetson Nano and it does load successfully and the inference time drops from 99.5ms to 61 ms, but I encountered two issues:
- the confidence scores are negative. As mentioned in some relevant thread, I bypassed this issue by adding 1.
- I have duplicates as if no NMS is applied. And this is where I'm basically hitting a wall and I need your help. I thought that the
--end2end
flag would take care of applying the NMS but it doesn't. Is it again because of the old implementation of the TensorRT v8.2.1.8? Should I perhaps entirely skip the--end2end
flag and allow yourinference
function inside thetrt.py
do the post-processing trick? What do you recommend?
Thanks in advance for your response! cheers
Actually I observed something peculiar. I tried these two different combinations
(yolov7)$ python3.10 export.py --weights my_models/yolov7-tiny.pt --grid --simplify --topk-all 200 --iou-thres 0.1 --conf-thres 0.4 --img-size 416 416
(trt)$ python3.6 export.py -o my_models/yolov7-tiny-416.onnx -e my_models/yolov7-tiny-416-fp16.trt -w 2 --iou_thres 0.1 --conf_thres 0.4 --end2end -p fp16 --max_det 200
and
(yolov7)$ python3.10 export.py --weights my_models/yolov7-tiny.pt --grid --simplify --topk-all 200 --iou-thres 0.7 --conf-thres 0.4 --img-size 416 416
(trt)$ python3.6 export.py -o my_models/yolov7-tiny-416.onnx -e my_models/yolov7-tiny-416-fp16.trt -w 2 --iou_thres 0.7 --conf_thres 0.4 --end2end -p fp16 --max_det 200
expecting that the first combination with the small iou_thres would result in a more permissive model that would allow for multiple detections of the same object, while the second combination would be more conservative and only permit the most dominant detection to survive. To my surprise, the two approached had absolutely no difference, as if the iou_thres
flag does not impact the conversion at all.
Any idea why is this happening? Has anyone experienced something similar before?
Ok, last update. I tried to skip --end2end
flag in both conversions (pt to onnx, and onnx to trt) and I set the flag to False like this when I call the inference
classIds, confidences , bboxs = self.inference(img,ratio,end2end=False)
Then everything works smoothly since the post-processing takes care of the NMS, but the inference time increases; not significantly but it does especially in a platform like the Nano. It's still faster than its onnx counterpart, but I think this approach is sub-optimal. Is there something special with fp16 that forces me to drive this way?
Thanks in advance for your support and I hope this issue will help someone in the future. Cheers
Looking forward to your reply!
@IoannisKaragiannis May I ask how did you install cuda-python on the jetson nano?