Chasel-Tsui/mmrotate-dcfl

Some accuracy issues

bobyoung159 opened this issue · 4 comments

When I used your configuration file dotav1_test_dcfl_psc_r50_3x.py for experiments on DOTAv1, the online validation map was only 71.03, which is far from the accuracy of 74.26 in your paper. May I ask if there is something wrong with me,can you provide your checkpoint?

Hi, could you please provide more details? such as the batch size, learning rate, etc. Besides, it seems that you used the PSC loss from the config name you povided, I am not sure whether it can work well with the DCFL.

I modify the batch_size from 2 to 6,The learning rate is 1.7 times the original, and no additional modifications have been made. This is my experimental log,can you help me check it?The PSC loss is another experiment.


sys.platform: win32
Python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5
NVCC: Cuda compilation tools, release 11.5, V11.5.119
MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.35.32215 版
GCC: n/a
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:

  • C++ Version: 199711
  • MSVC 192829337
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 2019
  • LAPACK is enabled (usually provided by MKL)
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.4
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1
OpenCV: 4.7.0
MMCV: 1.6.0
MMCV Compiler: MSVC 192930140
MMCV CUDA Compiler: 11.3
MMRotate: 0.3.4+unknown

2023-05-16 19:49:02,800 - mmrotate - INFO - Distributed training: False
2023-05-16 19:49:02,938 - mmrotate - INFO - Config:
dataset_type = 'DOTADataset'
data_root = 'D:/split_ss_dota/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le135'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=6,
workers_per_gpu=2,
train=dict(
type='DOTADataset',
ann_file='D:/split_ss_dota/trainval/labelTxt/',
img_prefix='D:/split_ss_dota/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le135'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
],
version='le135'),
val=dict(
type='DOTADataset',
ann_file='D:/split_ss_dota/trainval/labelTxt/',
img_prefix='D:/split_ss_dota/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le135'),
test=dict(
type='DOTADataset',
ann_file='D:/split_ss_dota/test/images/',
img_prefix='D:/split_ss_dota/test/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le135'))
optimizer = dict(type='SGD', lr=0.00425, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.3333333333333333,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
evaluation = dict(interval=3, metric='mAP')
checkpoint_config = dict(interval=4)
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(
type='WandbLoggerHook',
init_kwargs=dict(project='mmrotate', name='dcflv1'))
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
angle_version = 'le135'
model = dict(
type='RotatedRetinaNet',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_input',
num_outs=5),
bbox_head=dict(
type='RDCFLHead',
num_classes=15,
in_channels=256,
stacked_convs=4,
feat_channels=256,
assign_by_circumhbbox=None,
dcn_assign=True,
dilation_rate=4,
anchor_generator=dict(
type='RotatedAnchorGenerator',
octave_base_scale=4,
scales_per_octave=1,
ratios=[1.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHAOBBoxCoder',
angle_range='le135',
norm_factor=1,
edge_swap=False,
proj_xy=True,
target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
reg_decoded_bbox=True,
loss_bbox=dict(type='RotatedIoULoss', loss_weight=1.0)),
train_cfg=dict(
assigner=dict(
type='C2FAssigner',
ignore_iof_thr=-1,
gpu_assign_thr=1024,
iou_calculator=dict(type='RBboxMetrics2D'),
assign_metric='gjsd',
topk=16,
topq=12,
constraint='dgmm',
gauss_thr=0.6),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(iou_thr=0.4),
max_per_img=2000))
work_dir = 'work_dirs/dcfl'
auto_resume = False
gpu_ids = range(0, 1)

Hi, here are several suggestions:
(1) It may be better to retain the original batch size 2 and learning rate 0.0025, the batch size has a great effect on the DOTA dataset. See discussion.
(2) Set norm_cfg=dict(type='BN', requires_grad=False), since the BN will slightly degrade the training results on aerial datasets.

Thank you very much for your help!
After I re-performed the experiment according to your configuration, the online test map is 74.13.