Question about lr
rightchose opened this issue · 5 comments
Line 216 in ee4318a
In training stage three, in your paper, Finally, we train the full model with an initial learning rate of 0.02 and 0.002, respectively, for the weights in the backbone and DA-CSPN++.
. But every iter, the using of adjust_learning_rate
will adjust all params (backbone and DA-CSPN++) with same lr?
The optimizer in stage 3 with different learning rate corresponding to different parameters is defined in main.py
:
elif (args.network_model == 'pe'):
model_bone_params = [
p for _, p in model.backbone.named_parameters() if p.requires_grad
]
model_new_params = [
p for _, p in model.named_parameters() if p.requires_grad
]
model_new_params = list(set(model_new_params) - set(model_bone_params))
optimizer = torch.optim.Adam([{'params': model_bone_params, 'lr': args.lr / 10}, {'params': model_new_params}],
lr=args.lr, weight_decay=args.weight_decay, betas=(0.9, 0.99))
In training stage three, in your paper,
Finally, we train the full model with an initial learning rate of 0.02 and 0.002, respectively, for the weights in the backbone and DA-CSPN++.
. But every iter, the using ofadjust_learning_rate
will adjust all params (backbone and DA-CSPN++) with same lr?
I know it. But in iterate
function, it will using
if mode == 'train':
model.train()
lr = helper.adjust_learning_rate(args.lr, optimizer, actual_epoch, args)
When the code first run at here. It't will apply adjust_learning_rate
. And this function.
def adjust_learning_rate(lr_init, optimizer, epoch, args):
"""Sets the learning rate to the initial LR decayed by 10 every 5 epochs"""
#lr = lr_init * (0.5**(epoch // 5))
#'''
lr = lr_init
if (args.network_model == 'pe' and args.freeze_backbone == False):
if (epoch >= 10):
lr = lr_init * 0.5
if (epoch >= 20):
lr = lr_init * 0.1
if (epoch >= 30):
lr = lr_init * 0.01
if (epoch >= 40):
lr = lr_init * 0.0005
if (epoch >= 50):
lr = lr_init * 0.00001
else:
if (epoch >= 10):
lr = lr_init * 0.5
if (epoch >= 15):
lr = lr_init * 0.1
if (epoch >= 25):
lr = lr_init * 0.01
#'''
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return lr
It will update all learning params with a lr of lr_init
.
The optimizer has two groups params with different learning rate as defined in main.py
. But in iterate
, the function adjust_learning_rate
updates the two groups params simultaneously with the same learning rate.
I think you're right so the parameters are actually updated with the same learning rate. It does be a mistake. The design of different learning rates comes from a common practice of some semantic segmentation networks that the parameters in the pretrained backbone are updated with 1/10 learning rate. Now I don't know whether it will work. Maybe you could try it.