shaunyuan22/CFINet

[Reimplementation] something wrong when I tried to train FCOS on SODA-D

CheerM opened this issue · 3 comments

Prerequisite

💬 Describe the reimplementation questions

I tried to run this:

CUDA_VISIBLE_DEVICES=1 python CFINet-master/tools/train.py
CFINet-master/configs/sodad-benchmarks/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_1x.py
--cfg-options work_dir=$SAVE_DIR/fcos/fcos_r50_fpn_1x

then got:

File "CFINet-master/mmdet/models/dense_heads/fcos_head.py", line 288, in get_targets
assert len(points) == len(self.regress_ranges)
AssertionError

Environment

mmdet 2.26.0
mmcv 1.5.0
python 3.8
pytorch 1.10.0

Expected results

No response

Additional information

  1. The dataset soda-d was processed step by step, as shown at readme.md

  2. modify the path/to/dataset in config files; keep others the same as latest repo

  3. What should I do to reproduce the results of FCOS on SODA-D? A timely reply would be appreciated!

seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?

seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?

Sure thing, here is the cfg for fcos

dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict( type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001, paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) model = dict( type='FCOS', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='caffe', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True), bbox_head=dict( type='FCOSHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64], norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=False, center_sampling=True, conv_bias=True, loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.6), max_per_img=100)) work_dir = '../soda_d_results_mmdet2/fcos/fcos_r50_fpn_1x' auto_resume = False gpu_ids = [0]

ALSO, other issues like loss turn into NAN were found during training retinanet and reppoint. Hence, the cfg for training retinanet is also showed below,

model = dict( type='RetinaNet', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=4), bbox_head=dict( type='RetinaHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=2, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=1000) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) work_dir = '../soda_d_results_mmdet2/retinanet/retinanet_r50_fpn_1x' auto_resume = False gpu_ids = [0]

thank you for your reply

Actually, I'm so confused... coz all things were simply follow readme.md, like copied repo, installed corresponding envs etc., there is no major change on code, and results still far from correct

for fcos, the default number of regress_ranges is 5 which is not aligned with the fpn output features in your config namely 4, see

regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),

for retinanet, you could increase warmup_iters cause single-stage method is unstable during training.