open-mmlab/mmyolo

top_down_layer_inputs = torch.cat([upsample_feat, feat_low], 1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list.

99HU opened this issue · 6 comments

99HU commented

Prerequisite

🐞 Describe the bug

I try use yolov8 to Train on my own Dataset,the classnum is 1. I am begginer of mmyolo,is there any mistake in my configs? my config is from the sample of mmyolo. Anyone help,i will be Appreciated

The configs:

`
base = ['../configs/base/default_runtime.py', '../configs/base/det_p5_tta.py']

========================Frequently modified parameters======================

-----data related-----

data_root = 'PaperDataSet' #### Root path of data

Path of train annotation file

train_ann_file = 'Train.json'
train_data_prefix = '' #### Prefix of train image path

Path of val annotation file

val_ann_file = 'Test.json'
val_data_prefix = '/' #### Prefix of val image path

num_classes = 1 #### Number of classes for classification

Batch size of a single GPU during training

train_batch_size_per_gpu = 16

Worker to pre-fetch data for each single GPU during training

train_num_workers = 2

persistent_workers must be False if num_workers is 0

persistent_workers = True

-----train val related-----

Base learning rate for optim_wrapper. Corresponding to 8xb16=64 bs

base_lr = 0.01
max_epochs = 200 #### Maximum training epochs

Disable mosaic augmentation for final 10 epochs (stage 2)

close_mosaic_epochs = 10

model_test_cfg = dict(
#### The config of multi-label for multi-class prediction.
multi_label=True,
#### The number of boxes before NMS
nms_pre=30000,
score_thr=0.001, #### Threshold to filter out boxes.
nms=dict(type='nms', iou_threshold=0.7), #### NMS type and threshold
max_per_img=300) #### Max number of detections of each image

========================Possible modified parameters========================

-----data related-----

img_scale = (256, 236) #### width, height

Dataset type, this will be used to define the dataset

dataset_type = 'YOLOv5CocoDataset'

Batch size of a single GPU during validation

val_batch_size_per_gpu = 1

Worker to pre-fetch data for each single GPU during validation

val_num_workers = 2

Config of batch shapes. Only on val.

We tested YOLOv8-m will get 0.02 higher than not using it.

batch_shapes_cfg = None

You can turn on batch_shapes_cfg by uncommenting the following lines.

batch_shapes_cfg = dict(

type='BatchShapePolicy',

batch_size=val_batch_size_per_gpu,

img_size=img_scale[0],

#### The image scale of padding should be divided by pad_size_divisor

size_divisor=32,

#### Additional paddings for pixel scale

extra_pad_ratio=0.5)

-----model related-----

The scaling factor that controls the depth of the network structure

deepen_factor = 0.33

The scaling factor that controls the width of the network structure

widen_factor = 0.5

Strides of multi-scale prior box

strides = [8, 16, 32]

The output channel of the last stage

last_stage_out_channels = 1024
num_det_layers = 3 #### The number of model output scales
norm_cfg = dict(type='BN', momentum=0.03, eps=0.001) #### Normalization config

-----train val related-----

affine_scale = 0.5 #### YOLOv5RandomAffine scaling ratio

YOLOv5RandomAffine aspect ratio of width and height thres to filter bboxes

max_aspect_ratio = 100
tal_topk = 10 #### Number of bbox selected in each level
tal_alpha = 0.5 #### A Hyper-parameter related to alignment_metrics
tal_beta = 6.0 #### A Hyper-parameter related to alignment_metrics

TODO: Automatically scale loss_weight based on number of detection layers

loss_cls_weight = 0.5
loss_bbox_weight = 7.5

Since the dfloss is implemented differently in the official

and mmdet, we're going to divide loss_weight by 4.

loss_dfl_weight = 1.5 / 4
lr_factor = 0.01 #### Learning rate scaling factor
weight_decay = 0.0005

Save model checkpoint and validation intervals in stage 1

save_epoch_intervals = 10

validation intervals in stage 2

val_interval_stage2 = 1

The maximum checkpoints to keep.

max_keep_ckpts = 2

Single-scale training is recommended to

be turned on, which can speed up training.

env_cfg = dict(cudnn_benchmark=True)

===============================Unmodified in most cases====================

model = dict(
type='YOLODetector',
data_preprocessor=dict(
type='YOLOv5DetDataPreprocessor',
mean=[0., 0., 0.],
std=[255., 255., 255.],
bgr_to_rgb=True),
backbone=dict(
type='YOLOv8CSPDarknet',
arch='P5',
last_stage_out_channels=last_stage_out_channels,
deepen_factor=deepen_factor,
widen_factor=widen_factor,
norm_cfg=norm_cfg,
act_cfg=dict(type='SiLU', inplace=True)),
neck=dict(
type='YOLOv8PAFPN',
deepen_factor=deepen_factor,
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels],
out_channels=[256, 512, last_stage_out_channels],
num_csp_blocks=3,
norm_cfg=norm_cfg,
act_cfg=dict(type='SiLU', inplace=True)),
bbox_head=dict(
type='YOLOv8Head',
head_module=dict(
type='YOLOv8HeadModule',
num_classes=num_classes,
in_channels=[256, 512, last_stage_out_channels],
widen_factor=widen_factor,
reg_max=16,
norm_cfg=norm_cfg,
act_cfg=dict(type='SiLU', inplace=True),
featmap_strides=strides),
prior_generator=dict(
type='mmdet.MlvlPointGenerator', offset=0.5, strides=strides),
bbox_coder=dict(type='DistancePointBBoxCoder'),
#### scaled based on number of detection layers
loss_cls=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
reduction='none',
loss_weight=loss_cls_weight),
loss_bbox=dict(
type='IoULoss',
iou_mode='ciou',
bbox_format='xyxy',
reduction='sum',
loss_weight=loss_bbox_weight,
return_iou=False),
loss_dfl=dict(
type='mmdet.DistributionFocalLoss',
reduction='mean',
loss_weight=loss_dfl_weight)),
train_cfg=dict(
assigner=dict(
type='BatchTaskAlignedAssigner',
num_classes=num_classes,
use_ciou=True,
topk=tal_topk,
alpha=tal_alpha,
beta=tal_beta,
eps=1e-9)),
test_cfg=model_test_cfg)

albu_train_transforms = [
dict(type='Blur', p=0.01),
dict(type='MedianBlur', p=0.01),
dict(type='ToGray', p=0.01),
dict(type='CLAHE', p=0.01)
]

pre_transform = [
dict(type='LoadImageFromFile', backend_args=base.backend_args),
dict(type='LoadAnnotations', with_bbox=True)
]

last_transform = [
dict(
type='mmdet.Albu',
transforms=albu_train_transforms,
bbox_params=dict(
type='BboxParams',
format='pascal_voc',
label_fields=['gt_bboxes_labels', 'gt_ignore_flags']),
keymap={
'img': 'image',
'gt_bboxes': 'bboxes'
}),
dict(type='YOLOv5HSVRandomAug'),
dict(type='mmdet.RandomFlip', prob=0.5),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
'flip_direction'))
]

train_pipeline = [
*pre_transform,
dict(
type='Mosaic',
img_scale=img_scale,
pad_val=114.0,
pre_transform=pre_transform),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
max_aspect_ratio=max_aspect_ratio,
#### img_scale is (width, height)
border=(-img_scale[0] // 2, -img_scale[1] // 2),
border_val=(114, 114, 114)),
*last_transform
]

train_pipeline_stage2 = [
*pre_transform,
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
dict(
type='LetterResize',
scale=img_scale,
allow_scale_up=True,
pad_val=dict(img=114.0)),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
max_aspect_ratio=max_aspect_ratio,
border_val=(114, 114, 114)), *last_transform
]

train_dataloader = dict(
batch_size=train_batch_size_per_gpu,
num_workers=train_num_workers,
persistent_workers=persistent_workers,
pin_memory=True,
sampler=dict(type='DefaultSampler', shuffle=True),
collate_fn=dict(type='yolov5_collate'),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file=train_ann_file,
data_prefix=dict(img=train_data_prefix),
filter_cfg=dict(filter_empty_gt=False, min_size=32),
pipeline=train_pipeline))

test_pipeline = [
dict(type='LoadImageFromFile', backend_args=base.backend_args),
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
dict(
type='LetterResize',
scale=img_scale,
allow_scale_up=False,
pad_val=dict(img=114)),
dict(type='LoadAnnotations', with_bbox=True, scope='mmdet'),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor', 'pad_param'))
]

val_dataloader = dict(
batch_size=val_batch_size_per_gpu,
num_workers=val_num_workers,
persistent_workers=persistent_workers,
pin_memory=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
test_mode=True,
data_prefix=dict(img=val_data_prefix),
ann_file=val_ann_file,
pipeline=test_pipeline,
batch_shapes_cfg=batch_shapes_cfg))

test_dataloader = val_dataloader

param_scheduler = None
optim_wrapper = dict(
type='OptimWrapper',
clip_grad=dict(max_norm=10.0),
optimizer=dict(
type='SGD',
lr=base_lr,
momentum=0.937,
weight_decay=weight_decay,
nesterov=True,
batch_size_per_gpu=train_batch_size_per_gpu),
constructor='YOLOv5OptimizerConstructor')

default_hooks = dict(
param_scheduler=dict(
type='YOLOv5ParamSchedulerHook',
scheduler_type='linear',
lr_factor=lr_factor,
max_epochs=max_epochs),
checkpoint=dict(
type='CheckpointHook',
interval=save_epoch_intervals,
save_best='auto',
max_keep_ckpts=max_keep_ckpts))

custom_hooks = [
dict(
type='EMAHook',
ema_type='ExpMomentumEMA',
momentum=0.0001,
update_buffers=True,
strict_load=False,
priority=49),
dict(
type='mmdet.PipelineSwitchHook',
switch_epoch=max_epochs - close_mosaic_epochs,
switch_pipeline=train_pipeline_stage2)
]

val_evaluator = dict(
type='mmdet.CocoMetric',
proposal_nums=(100, 1, 10),
ann_file=data_root +"\"+ val_ann_file,
metric='bbox')
test_evaluator = val_evaluator

train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=max_epochs,
val_interval=save_epoch_intervals,
dynamic_intervals=[((max_epochs - close_mosaic_epochs),
val_interval_stage2)])

val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

`

The Command

python tools\train.py moudle\yolov8.py

The errorMesage

11/17 14:24:07 - mmengine - INFO - Checkpoints will be saved to E:\HWS\mmlab\mmyolo-main\work_dirs\yolov8. Traceback (most recent call last): File "tools\train.py", line 123, in <module> main() File "tools\train.py", line 119, in main runner.train() File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\runner\runner.py", line 1777, in train model = self.train_loop.run() # type: ignore File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\runner\loops.py", line 96, in run self.run_epoch() File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\runner\loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\runner\loops.py", line 128, in run_iter outputs = self.runner.model.train_step( File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\model\base_model\base_model.py", line 114, in train_step losses = self._run_forward(data, mode='loss') # type: ignore File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmengine\model\base_model\base_model.py", line 346, in _run_forward results = self(**data, mode=mode) File "C:\Users\A\AppData\Roaming\Python\Python38\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmdet\models\detectors\base.py", line 92, in forward return self.loss(inputs, data_samples) File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmdet\models\detectors\single_stage.py", line 77, in loss x = self.extract_feat(batch_inputs) File "D:\ProgramData\anaconda3\envs\PaperLocation\lib\site-packages\mmdet\models\detectors\single_stage.py", line 148, in extract_feat x = self.neck(x) File "C:\Users\A\AppData\Roaming\Python\Python38\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "e:\hws\mmlab\mmyolo-main\mmyolo\models\necks\base_yolo_neck.py", line 239, in forward top_down_layer_inputs = torch.cat([upsample_feat, feat_low], 1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list.

Environment

mmcv 2.0.1
mmdet 3.2.0
mmengine 0.9.1
mmyolo 0.6.0

Additional information

I try to use myown dataset which is mscocotype,i have ever train on other mmlab project.maybe the bug can not happen to the dataset. my configs is from the confis,and i only modify the Frequently modified parameters and Possible modified parameters Part. Is there anything need to modify which i missed. i am looking forward for your reply ,thanks very much.

I solved this issue by modifying the img_scale.

99HU commented

I solved this issue by modifying the img_scale.

Thank you very much. I made a mistake. My problem has been solved. : )

How did you both @99HU and @Tym216 solved this? please detail me as i am facing this same issue.

99HU commented

How did you both @99HU and @Tym216 solved this? please detail me as i am facing this same issue.

i change my img_scale from (256,236) to (256,256) then the issue solved.

You changed at all the places where img_scale is present? also what is your input size 640x640?

Hi @99HU, did you face KeyError: 'YOLOv5CocoDataset is not in the mmengine::dataset registry. Please check whether the value of YOLOv5CocoDataset is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module' ?