grimoire/mmdetection-to-tensorrt

Some questions about this project

MyDecember12 opened this issue · 5 comments

This is a very interesting and useful project, thank you for your contribution, but I still have three questions:
1.I can get the results on retinanet and yolo v3, but the two-stage detection algorithm, such as faster-rcnn, cascade_rcnn, does not work, and a Segmentation fault (core dumped) error is reported.
2.I used fp16 mode, but the size of the model converted on retinanet and yolo v3 is larger than the original model size on mmdetection, and the inference time is also very long, so there must be some problems, but I don’t know where it happened.
3.retinanet and yolo v3 can only work under the specified size.
Finally, I try to join the discuss group, please agree, thank you, the following is my environment:
enviroment:

  • OS: [Ubuntu18.04]
  • python_version: [3.6]
  • pytorch_version: [1.3]
  • cuda_version: [cuda-10.0]
  • cudnn_version: [7.6.5]
  • mmdetection_version: [2.5.0]

Hi, thanks for using this project.
About the question:

  1. I am not so sure. Guess it is caused by amirstan_plugin, are you using the latest version? Try rebuild it see if it works.
  2. Not all nvidia device support fp16. read this for detail
  3. For retinanet, The input tensor shape is decided by opt_shape_param. Yolov3 use a fix input tensor shape in mmdetection.

Thank you for your reply. First of all, I used the latest version of amirstan_plugin, and then the fp16 mode I used was effective, because I compared the size before and after use, and the size after using fp16 is still larger than that in mmdetction. Finally, I have been setting the size in opt_shape_param, retinanet only works when 800x800 optimize shape

And I want to ask how much the size of your converted model is reduced, and how much the inference time is reduced

because I compared the size before and after use, and the size after using fp16 is still larger than that in mmdetction.

That is expected, TRT is used to speedup inference, not reduce disk usage.
Some device does not support full rate pf16. That means converted model might not get bonus.

Could you provide more detail about these problems? Such as the GPU device, convert/inference codes, logs, test images or anything might related. I will see if I can do something.

Thank you very much, because these problems have troubled me for a long time. My device is Tesla V100. The convert and inference codes are as follows:

import numpy as np
import tensorrt
import torch
from mmdet2trt import mmdet2trt
from mmdet2trt.apis.inference import init_detector
from mmdet2trt.apis.inference import inference_detector
import cv2
from os.path import expanduser
import time
#home = expanduser("~")
cfg_path="/home/package/mmdetection/configs/retinanet/retinanet_r50_fpn_1x_coco.py"
weight_path="/share1/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth"
image_path="/share1/1.jpg"
save_path="/home/package/mmdetection-to-tensorrt/result/mm_test1.pth"
warp_model_path="/home/package/mmdetection-to-tensorrt/result/wrap_model.pth"
engine_path = "/home/package/mmdetection-to-tensorrt/result/mm_test1.engine"
opt_shape_param=[
    [
        [1,3,320,320], # min shape
        [1,3,800,800], # optimize shape
        [1,3,1344,1344], # max shape
    ]
]
max_workspace_size=1<<30 # some module need large workspace, add workspace size when OOM.
trt_model,wrap_model = mmdet2trt(cfg_path, weight_path, opt_shape_param=opt_shape_param, fp16_mode=True, max_workspace_size=max_workspace_size)
torch.save(wrap_model.state_dict(), warp_model_path)
torch.save(trt_model.state_dict(), save_path)
trt_model = init_detector(save_path)

result= inference_detector(trt_model, image_path, cfg_path, "cuda:0")
num_detections = result[0].item()
trt_bbox = result[1][0]
trt_score = result[2][0]
trt_cls = result[3][0]
image = cv2.imread(image_path)
input_image_shape = image.shape
for i in range(num_detections):
    scores = trt_score[i].item()
    classes = int(trt_cls[i].item())
    if scores < 0.7:
        continue
    bbox = tuple(trt_bbox[i])
    bbox = tuple(int(v) for v in bbox)

    color = ((classes >> 2 & 1) * 128 + (classes >> 5 & 1) * 128,
                 (classes >> 1 & 1) * 128 + (classes >> 4 & 1) * 128,
                 (classes >> 0 & 1) * 128 + (classes >> 3 & 1) * 128)
    cv2.rectangle(image, bbox[:2], bbox[2:], color, thickness=5)

if input_image_shape[0] > 1280 or input_image_shape[1] > 720:
    scales = min(720 / image.shape[0], 1280 / image.shape[1])
    image = cv2.resize(image, (0, 0), fx=scales, fy=scales)
cv2.imwrite('./res.jpg',image)

with open(engine_path, mode='wb') as f:
    f.write(trt_model.state_dict()['engine'])

image:
1