GPU Memory

Question

GPU Memory

Closed this issue 3 years ago · 7 comments

Thanks for your useful project. It works really well in Tesla T4. I tested several algorithms, and on average, the algorithms can be three times faster. For some models such as HTC with backbone R-50-FPN, even about 8x speed up.

Since MMDetection does not support YOLOv4 now, I analyzed TensorRT's performance for YOLOv4 using the project 'Tianxiaomo/pytorch-YOLOv4'. About 2x speed up can be got compared to Darknet. I also found GPU memory-usage can be decreased to about 1/4 with TensorRT engine compared to pytorch model.

However, GPU memory-usage will be increased in your project. I studied the optimization principle of TensorRT, probably the memory-usage should be decreased because of dynamic tensor memory.

Could you please figure out the problem?

enviroment:

OS: Centos
python_version: 3.6
pytorch_version:1.5.1
cuda_version: cuda-10.2
cudnn_version: 7.6.5
mmdetection_version: 2.5.0

Answer 1 · 2020-11-02T03:26:01.000Z

hey, three times faster, fp16 or fp32?

Answer 2 · 2020-11-02T03:29:27.000Z

fp16, of course

Answer 3 · 2020-11-02T04:29:17.000Z

Dynamic shape mode will allocate workspace according to the max shape. Better to set min_shape=opt_shape=max_shape if you does not need dynamic shape input.
use a small max_workspace_size should also help to reduce the memory usage.

Answer 4 · 2021-02-26T05:40:54.000Z

Hello, "GPU memory-usage will be increased", did you solve this problem?

Answer 5 · 2021-02-26T06:21:59.000Z

Dynamic shape mode will allocate workspace according to the max shape. Better to set min_shape=opt_shape=max_shape if you does not need dynamic shape input.
use a small max_workspace_size should also help to reduce the memory usage.

Hi, great job !
I have also meet the "GPU memory-usage will be increased" problem. I have set min_shape=opt_shape=max_shape, max_workspace_size=1<<28(256MB). But the problem exists in inference phase, I tested my model using 1000 images (loops for simulate as below), the cuda memory increase from 2434MiB to 3255MiB. If I test with more images, there will be "cuda out of memory". What should I do to solve this?

for i in range(1000):
----- with torch.no_grad():
------ ------- result = model(tensor)

Answer 6 · 2021-02-26T08:03:12.000Z

@XiaYuanxiang Are you using TensorRT7.0?
If you did, please update TensorRT and try again. Read this for detail.

Answer 7 · 2021-02-26T08:12:46.000Z

@XiaYuanxiang Are you using TensorRT7.0?
If you did, please update TensorRT and try again. Read this for detail.

yes, I'm using docker image you offered. It seems a bug of TensorRT, thanks for your reply.