Sense-X/Co-DETR

为什么我的显存开销非常大,这正常吗?

tyloocifer opened this issue · 13 comments

当我用3090 设置图片尺寸resize为1920*1080,batchsize=1时,显存会直接爆掉,请问我该如何解决这个问题,是哪一步导致了这么大的开销?

i used co_dino_5scale_lsj_r50_1x_coco.py in the MMdetection project

When i change the model to dino it works. but co_dino doesnt. i try to reduce the num_co_head and only use fasterRcnn or Atss it still require a lot of memory.

When i change the model to dino it works. but co_dino doesnt. i try to reduce the num_co_head and only use fasterRcnn or Atss it still require a lot of memory.

LSJ aug requires more memory than DETR aug. If you adopt a resolution of 1920x1080, it's better to use the config co_dino_5scale_r50_1x_coco.py.
Besides, you can enable checkpointing by adding with_cp=True to backbone config and change the 'with_cp' in encoder config from 4 to 6:

backbone=dict(
    type='ResNet',
    depth=50,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
    norm_cfg=dict(type='BN', requires_grad=False),
    norm_eval=True,
    style='pytorch',
    with_cp=True,
    init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),

it still doesnt work. I just adopt your network. As for dataloader and other config i didnt use it. when I use Dino it just allocate 10G when batchsize=1, but co_dino_r50_1x cant run. it shows CUDA out of memory

Do you use DINO-4scale?

yep

perhaps i need to change it into 4scale?

Yes, the 5-scale model consumes much more memory than 4-scale

I use projects/configs/co_dino/co_dino_5scale_swin_large_16e_o365tococo.py, and it seems if I freeze the backbone and set the checkpoint to False, it will OOM in a 24G A30

I use projects/configs/co_dino/co_dino_5scale_swin_large_16e_o365tococo.py, and it seems if I freeze the backbone and set the checkpoint to False, it will OOM in a 24G A30

Co-DETR with frozen SwinL and image size 1333x800 requires more than 15GB memory. The config you use enlarges the resolution by 1.5x and 24GB memory may be insufficient. AMP and FSDP can help you to reduce the training memory.

if i wanna get a 4-scale model, where should i change except config file.

The total loss has been oscillating around 20, is this normal?