grimoire/mmdetection-to-tensorrt

Not quite sure how to use batch inference

MysticStranger opened this issue · 5 comments

Hello! Thank you for a nice repository!

I have managed to get the basic optimization to work, i.e. i can convert models from FP32 to FP16 when the batch_size = 1. So for example, using:

opt_shape_param=[
[
[1,3,320,240], # min shape
[1,3,1333,800], # optimize shape
[1,3,1333,1333], # max shape
]
]

This gives me varying increases in performance depending on the model i optimize.

I am now trying to get a model to be optimized for 8 images as input i.e.

opt_shape_param=[
[
[8,3,320,240], # min shape
[8,3,640,480], # optimize shape
[8,3,1080,720], # max shape
]
]

I have read that batch inference is supported but when i look at the api for inference_detector, it can only accept one image or image path. So i tried using TRTModule() directly but now i am getting only zeroes as output even though nothing seems to be wrong with building the engine.

Would you perhaps have the time to explain how to use batch inference with a simple example?
If you need any additional information into what i have done so far I will post it here.

Thank you.

Hi
I recommend you to implement the preprocess by youself (even when batch_size=1). The api provided in this repo is just used to evaluate engine.
You can use image process tools like nvidia npp to further accelerate the preprocess. That is out of scope of this conversion tools. So I do not plan to add example here.

Would you mind share the model you are using with me? I will see if there are anything wrong with the batch inference.

Hey again!

You gave me the hint to what the problem was. My preprocessing only did normalization but i forgot that mmdetection has an entire pipeline for doing multiple forms of image processing before feeding the data into a model.

My recommendation to anyone who wants to do batch inference is to use the test pipeline that exists within mmdetection(usage of this can be found in the inference_detector method for a single frame). Just preprocess the frames individually and then do torch.stack on a list of these frames as tensors. You can then feed this stack of tensors directly into the loaded trt_model. Don't forget to slap on a .cuda() after your tensor before you do inference on it.

In my case i used 32x4d resnext with dconv on c3 to c5. So i took the data processing pipeline from coco_detection.py in mmdetection. I will put up the code here that i used to get it to work later today in the event that anyone else has the same problem.

By the way @grimoire i used nvidia-driver 455 and didn't have any issues. Did you update the repository? I know some people had problems relating to drivers before.

Thank you.

Nice!
I have update plugins code few days ago. Glad to hear that 455 works.

Hey again.

Here is what i did for batch inference for anyone that is wondering. Note that i am preparing each image individually and this is probably inefficient for real applications as you should be doing it in batches or applying some form of parallelism to the preprocessing of multiple images.

from mmdet2trt import mmdet2trt
from mmdet2trt.apis.inference import init_detector, inference_detector
from mmdet.datasets.pipelines import Compose
import torch
import cv2
import numpy as np
import time
import mmcv

videoPath = 'your path to video'
modelPath = 'your saved .pth file optimized by tensorrt'

def preprocess(img, cfg):
    #Preprocessing for one image. Returns a tensor of one image
    #returns a tensor of size (3, H, W)
    if isinstance(cfg, str):
        cfg = mmcv.Config.fromfile(cfg)

    # directly add img
    data = dict(img=img)
    cfg = cfg.copy()
    # set loading pipeline type
    cfg.data.test.pipeline[0].type = 'LoadImageFromWebcam'

    test_pipeline = cfg.data.test.pipeline
    test_pipeline = Compose(test_pipeline)

    # prepare data
    data = test_pipeline(data)
    return data['img'][0]

frames = []
results = []
trt_model = init_detector(modelPath)
count = 0
start = time.time()
cap = cv2.VideoCapture(videoPath)
while True:
    ret, frame = cap.read()
    if frame is None:
        break
    frame = mmdetPreprocess(frame, cfg_path)
    frames.append(frame)
    
    count += 1
    if count % 8 == 0: #My model was optimized for a batch size of 8
        inf = torch.stack(frames, axis = 0).cuda()
        result = trt_model(inf)
        frames = []
        results.append(result)

print('number of frames analyzed:', count)
print('Time:', time.time() - start)

@MysticStranger I tried your solution but I'm getting following error:

Traceback (most recent call last):
  File "infer_rt.py", line 58, in <module>
    image_batch = torch.stack(image_batch, axis = 0).cuda()
TypeError: expected Tensor as element 0 in argument 0, but got DataContainer

Here is my inference script:

import mmcv
import os
import time
import numpy as np
import cv2
from mmdet2trt import mmdet2trt
from mmdet2trt.apis import inference_detector, init_detector
import torch
from mmdet.datasets.pipelines import Compose

# Specify the path to model config and checkpoint file
config_file = '/models/faster_rcnn_r50_fpn_mdconv_c3-c5_group4_1x_coco.py'
checkpoint_file = '/models/output.trt'

# build the model from a config file and a checkpoint file
model = init_detector(checkpoint_file)

def preprocess(img, cfg):
    #Preprocessing for one image. Returns a tensor of one image
    #returns a tensor of size (3, H, W)
    if isinstance(cfg, str):
        cfg = mmcv.Config.fromfile(cfg)

    # directly add img
    data = dict(img=img)
    cfg = cfg.copy()
    # set loading pipeline type
    cfg.data.test.pipeline[0].type = 'LoadImageFromWebcam'

    test_pipeline = cfg.data.test.pipeline
    test_pipeline = Compose(test_pipeline)

    # prepare data
    data = test_pipeline(data)
    return data['img'][0]

def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,
               font_scale=0.5, thickness=2):
    size = cv2.getTextSize(label, font, font_scale, thickness)[0]
    x, y = point
    cv2.rectangle(image, (x, y - size[1]),
                  (x + size[0], y), (255, 0, 0), cv2.FILLED)
    cv2.putText(image, label, point, font, font_scale,
                (255, 255, 255), thickness)

# test a single image and show the results
IMAGE_PATH = "/models/dataset/sort_potato_l2/val/images"
image_batch = []
for img in os.listdir(IMAGE_PATH):
    if img.endswith("jpg"):
        # print("inferencing on ", img)
        image = os.path.join(IMAGE_PATH, img)
        image = cv2.imread(image)
        image = preprocess(image, config_file)
        image_batch.append(image)
    if len(image_batch) == 4:
        tic = time.time()
        image_batch = torch.stack(image_batch, axis = 0).cuda()
        results = model(image_batch)
        print("FPS: ", 1/(time.time()-tic))
        image_batch = []