weight mismatch
Senwang98 opened this issue · 3 comments
@ArchipLab-LinfengZhang
Hi,
When using code you provide, I meet weight load mismatch problem.
Since x101-32d.pth's url is unavailable, I just download cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5_1x_coco-e75f90c8
Teacher's backbone is set to have no pretrained weight.
_base_ = './cascade_mask_rcnn_r50_fpn_1x_coco.py'
model = dict(
# pretrained='open-mmlab://resnext101_32x4d',
backbone=dict(
type='ResNeXt',
depth=101,
groups=32,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
style='pytorch'))
build_teacher() function:
def build_teacher():
teacher_cfg = Config.fromfile("configs/dcn/cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5_1x_coco.py")
teacher = build_detector(
teacher_cfg.model, train_cfg=teacher_cfg.train_cfg, test_cfg=teacher_cfg.test_cfg)
load_checkpoint(teacher,
# "mmdetection/checkpoints/cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5_1x_c.pth",
"chechpoints/cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5_1x_coco-e75f90c8.pth",
map_location='cpu')
return teacher
But I think I have replace all thing you mentioned, I still meet teacher's weight mismatch problem as following:
2021-12-10 20:50:37,810 - mmdet - INFO - load model from: torchvision://resnet50
2021-12-10 20:50:37,940 - mmdet - WARNING - The model and loaded state dict do not match exactly
size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
size mismatch for layer1.0.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for layer1.0.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for layer1.0.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
Have you met similiar problem?
我debug了几天,换了几个mmdet版本都跑不通。。模型加载是没问题了,build_teacher里面还要套一层MMDistributedDataParallel,不然会报参数list(Tensor)对不上DataContainer,但是又给我报错TypeError: kd_feat_loss is not a tensor or list of tensors
losses: {'kd_feat_loss': 0, 'kd_channel_loss': 0, 'kd_spatial_loss': 0, 'kd_nonlocal_loss': 0.0, 'loss_rpn_cls': [tensor(0.4318, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.1646, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0469, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0213, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0296, device='cuda:0', grad_fn=<MulBackward0>)], 'loss_rpn_bbox': [tensor(0.0399, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0577, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0251, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0113, device='cuda:0', grad_fn=<MulBackward0>), tensor(0.0429, device='cuda:0', grad_fn=<MulBackward0>)], 'loss_cls': tensor(4.4297, device='cuda:0', grad_fn=<MulBackward0>), 'acc': tensor([0.], device='cuda:0'), 'loss_bbox': tensor(0.0252, device='cuda:0', grad_fn=<MulBackward0>)}
他给的几个loss都是值,mmdet里面self._parse_losses(losses)需要解析Tensor或者list
@Shawnnnnn
在CWD上的基础上实现这个ICLR的论文吧,思路反正简单,没必要在语法上纠结,而且这个工作没那么简洁,点现在看不是很好,不影响对KD-det领域的把握