grimoire/mmdetection-to-tensorrt

Problem with fp16 detection results (zero x value)

Kaeseknacker opened this issue · 5 comments

Describe the bug
I have a problem with a (fp16) converted model which produces wrong detection results. In some boxes, the x value in the top left corner is 0, which means that the boxes are always drawn to the left edge of the image. It must have something to do with the conversion, because the problem does not occur in the original model. Also with the FP32 conversion I could not find the problem.

Here an example:
frame100470_mmdet
MMDetection result

frame100470_trt
TRT_FP32 result

frame100470_trt_orig
TRT_FP16 result

As you can see in the last image some bounding boxes are stretched to the left image edge.
Someone some ideas whats happening here?

enviroment:

  • OS: Debian 11
  • python_version: 3.7
  • pytorch_version: 1.8.0
  • cuda_version: cuda_11.1.1_455.32.00_linux
  • tensorRT: TensorRT_7.2.2.3.Ubuntu-18.04.x86-64_gnu.cuda-11.1.cudnn8.0
  • cudnn_version: 8.0.5.39
  • mmdetection_version: 2.12.0
  • mmdetection-to-tensorrt_version: 0.3.0
  • GPU: RTX2070 Super

Conversion Log

mmdet2trt --save-engine=true --min-scale 1 3 1080 1080 --opt-scale 1 3 1080 1920 --max-scale 1 3 1920 1920 ./fcos_r50_caffe_fpn_gn-head_1x.py ./epoch_12.pth trt-detector-pers-veh_fcos-r50-fpn_cc7.5_cu11.1_trt7.2.2.3_0.1.0.trt --fp16 True |& tee conversion.log


WARNING:root:module mmdet.models.dense_heads.TransformerHead not exist.
INFO:mmdet2trt:Loading model from config: ./fcos_r50_caffe_fpn_gn-head_1x.py
INFO:mmdet2trt:Wrapping model
INFO:mmdet2trt:Model warmup
INFO:mmdet2trt:Converting model
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
INFO:mmdet2trt:Conversion took 113.13975644111633 s
INFO:mmdet2trt:Saving TRT model to: trt-detector-pers-veh_fcos-r50-fpn_cc7.5_cu11.1_trt7.2.2.3_0.1.0.trt
INFO:mmdet2trt:Saving TRT model engine to: trt-detector-pers-veh_fcos-r50-fpn_cc7.5_cu11.1_trt7.2.2.3_0.1.0.engine
Use load_from_local loader

Config File

dataset_type = 'VisdroneDataset'
data_root = '/net/merkur/storage/deeplearning/datasets/VisDrone2020/'
input_size = 800
img_norm_cfg = dict(
    mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RandomCrop', crop_size=(800, 800)),
    dict(
        type='Resize',
        img_scale=[(600, 600), (800, 800), (1000, 1000)],
        multiscale_mode='value',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[102.9801, 115.9465, 122.7717],
        std=[1.0, 1.0, 1.0],
        to_rgb=False),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[102.9801, 115.9465, 122.7717],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=2,
    train=dict(
        type='VisdroneDataset',
        ann_file=[
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-train/VisDrone_DET_train_w_ign_reg.json',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-val/VisDrone_DET_val_w_ign_reg.json',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-test-dev/VisDrone_DET_test-dev_w_ign_reg.json',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-train/VisDrone_MOT_train_w_ign_reg_n20.json',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-val/VisDrone_MOT_val_w_ign_reg_n20.json'
        ],
        img_prefix=[
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-train/images/',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-val/images/',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-DET-test-dev/images/',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-train/sequences/',
            '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-val/sequences/'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RandomCrop', crop_size=(800, 800)),
            dict(
                type='Resize',
                img_scale=[(600, 600), (800, 800), (1000, 1000)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[102.9801, 115.9465, 122.7717],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='VisdroneDataset',
        ann_file=
        '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-test-dev/VisDrone_MOT_test-dev_w_ign_reg_n10.json',
        img_prefix=
        '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-test-dev/sequences/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                scale_factor=1.0,
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[102.9801, 115.9465, 122.7717],
                        std=[1.0, 1.0, 1.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='VisdroneDataset',
        ann_file=
        '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-test-dev/VisDrone_MOT_test-dev_w_ign_reg_n10.json',
        img_prefix=
        '/net/merkur/storage/deeplearning/datasets/VisDrone2020/VisDrone2019-MOT-test-dev/sequences/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                scale_factor=1.0,
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[102.9801, 115.9465, 122.7717],
                        std=[1.0, 1.0, 1.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(
    type='SGD',
    lr=0.01,
    momentum=0.9,
    weight_decay=0.0001,
    paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='constant',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmdetection/v2.0/fcos/fcos_r50_caffe_fpn_gn-head_mstrain_640-800_2x_coco/fcos_r50_caffe_fpn_gn-head_mstrain_640-800_2x_coco-d92ceeea.pth'
resume_from = None
workflow = [('train', 1)]
work_dir = '/net/fulu/storage/deeplearning/users/stadan/mmdetection212/work_dirs/visdrone/det+mot/fcos_r50_caffe_fpn_gn-head_1x'
model = dict(
    type='FCOS',
    pretrained='open-mmlab://detectron/resnet50_caffe',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSHead',
        num_classes=10,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        strides=[8, 16, 32, 64, 128],
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=200))
gpu_ids = range(0, 1)

Super strange: I converted the fp16 model again and now it seems to work. The conversion is probably not deterministic. I have now converted it 3x and each time the file size is a little different.

EDIT: The 3rd try has the same problem as the first try... Only the 2nd try works.

Thanks for the report. I will have a test.

Can you update the convert tools (torch2trt_dynamic, amirstan_plugin, mmdetection-to-tensorrt) and try again? The latest version is 0.5.0.
There are a lot of changes include bug fixing.

Thank you, I will try.
Do I also have to update TensorRT and mmdetection or can I stay on TensorRT 7.2.2.3 and mmdetection 2.12?

Theoretically, TensorRT7.2.2.3 and mmdetection 2.12 are supported.