grimoire/mmdetection-to-tensorrt

GPU Memory

Closed this issue · 7 comments

Thanks for your useful project. It works really well in Tesla T4. I tested several algorithms, and on average, the algorithms can be three times faster. For some models such as HTC with backbone R-50-FPN, even about 8x speed up.

Since MMDetection does not support YOLOv4 now, I analyzed TensorRT's performance for YOLOv4 using the project 'Tianxiaomo/pytorch-YOLOv4'. About 2x speed up can be got compared to Darknet. I also found GPU memory-usage can be decreased to about 1/4 with TensorRT engine compared to pytorch model.

However, GPU memory-usage will be increased in your project. I studied the optimization principle of TensorRT, probably the memory-usage should be decreased because of dynamic tensor memory.

Could you please figure out the problem?

enviroment:

  • OS: Centos
  • python_version: 3.6
  • pytorch_version:1.5.1
  • cuda_version: cuda-10.2
  • cudnn_version: 7.6.5
  • mmdetection_version: 2.5.0

hey, three times faster, fp16 or fp32?

fp16, of course

Dynamic shape mode will allocate workspace according to the max shape. Better to set min_shape=opt_shape=max_shape if you does not need dynamic shape input.
use a small max_workspace_size should also help to reduce the memory usage.

Hello, "GPU memory-usage will be increased", did you solve this problem?

Dynamic shape mode will allocate workspace according to the max shape. Better to set min_shape=opt_shape=max_shape if you does not need dynamic shape input.
use a small max_workspace_size should also help to reduce the memory usage.

Hi, great job !
I have also meet the "GPU memory-usage will be increased" problem. I have set min_shape=opt_shape=max_shape, max_workspace_size=1<<28(256MB). But the problem exists in inference phase, I tested my model using 1000 images (loops for simulate as below), the cuda memory increase from 2434MiB to 3255MiB. If I test with more images, there will be "cuda out of memory". What should I do to solve this?

for i in range(1000):
----- with torch.no_grad():
------ ------- result = model(tensor)

@XiaYuanxiang Are you using TensorRT7.0?
If you did, please update TensorRT and try again. Read this for detail.

@XiaYuanxiang Are you using TensorRT7.0?
If you did, please update TensorRT and try again. Read this for detail.

yes, I'm using docker image you offered. It seems a bug of TensorRT, thanks for your reply.