[Reimplementation] something wrong when I tried to train FCOS on SODA-D
CheerM opened this issue · 3 comments
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version (master) or latest version (3.x).
💬 Describe the reimplementation questions
I tried to run this:
CUDA_VISIBLE_DEVICES=1 python CFINet-master/tools/train.py
CFINet-master/configs/sodad-benchmarks/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_1x.py
--cfg-options work_dir=$SAVE_DIR/fcos/fcos_r50_fpn_1x
then got:
File "CFINet-master/mmdet/models/dense_heads/fcos_head.py", line 288, in get_targets
assert len(points) == len(self.regress_ranges)
AssertionError
Environment
mmdet 2.26.0
mmcv 1.5.0
python 3.8
pytorch 1.10.0
Expected results
No response
Additional information
-
The dataset soda-d was processed step by step, as shown at readme.md
-
modify the path/to/dataset in config files; keep others the same as latest repo
-
What should I do to reproduce the results of FCOS on SODA-D? A timely reply would be appreciated!
seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?
seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?
Sure thing, here is the cfg for fcos
dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict( type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001, paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) model = dict( type='FCOS', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='caffe', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True), bbox_head=dict( type='FCOSHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64], norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=False, center_sampling=True, conv_bias=True, loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.6), max_per_img=100)) work_dir = '../soda_d_results_mmdet2/fcos/fcos_r50_fpn_1x' auto_resume = False gpu_ids = [0]
ALSO, other issues like loss turn into NAN were found during training retinanet and reppoint. Hence, the cfg for training retinanet is also showed below,
model = dict( type='RetinaNet', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=4), bbox_head=dict( type='RetinaHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=2, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=1000) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) work_dir = '../soda_d_results_mmdet2/retinanet/retinanet_r50_fpn_1x' auto_resume = False gpu_ids = [0]
thank you for your reply
Actually, I'm so confused... coz all things were simply follow readme.md, like copied repo, installed corresponding envs etc., there is no major change on code, and results still far from correct
for fcos, the default number of regress_ranges
is 5 which is not aligned with the fpn output features in your config namely 4, see
for retinanet, you could increase warmup_iters
cause single-stage method is unstable during training.