DeepSpeed2 不能自动排除冻结的参数[Bug]

Question

DeepSpeed2 不能自动排除冻结的参数[Bug]

Baboom-l opened this issue 3 months ago · 1 comments

Baboom-l commented 3 months ago

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).

Environment

mmdetection3.3.0
mmengine 0.8.4
torch1.13.1

Reproduces the problem - code sample

runner_type = 'FlexibleRunner'
strategy = dict(
type='DeepSpeedStrategy',
gradient_clipping=0.1,
fp16=dict(
enabled=True,
fp16_master_weights_and_grads=False,
loss_scale=0,
loss_scale_window=500,
hysteresis=2,
min_loss_scale=1,
initial_scale_power=15,
),
inputs_to_half=['inputs'],
zero_optimization=dict(
stage=2,
allgather_partitions=True,
reduce_scatter=True,
allgather_bucket_size=50000000,
reduce_bucket_size=50000000,
overlap_comm=True,
contiguous_gradients=True,
cpu_offload=False),
)

optim_wrapper = dict(
type='DeepSpeedOptimWrapper',
optimizer=dict(
type='AdamW',
lr=0.0001, # 0.0002 for DeformDETR
weight_decay=0.0001),
# clip_grad=dict(max_norm=0.1, norm_type=2),
paramwise_cfg=dict(custom_keys={
'memory_trans_norm': dict(decay_mult=0),
'in_proj_bias': dict(decay_mult=0),
'backbone': dict(lr_mult=0.01)},
norm_decay_mult=0,
bias_decay_mult=0,
bypass_duplicate=True))

To debug

default_hooks.update(dict(logger=dict(interval=1))) # noqa
log_processor.update(dict(window_size=1)) # noqa

Reproduces the problem - command or script

当我冻结模型中部分权重后运行报错

Reproduces the problem - error message

Additional information

No response

Answer 1 · 2024-04-28T03:03:31.000Z

resolved by #1441