Optimizers without `weight_decay` produce errors
pietrolesci opened this issue · 1 comments
pietrolesci commented
Hi there,
I am trying switching optimizer in the following example in the README
python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1
to
python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1 optimizer=adamax
and I get the error that the weight_decay
configuration is not present. I think the source of the error is
As a fix, I did the following locally
def optimizer(self, model: torch.nn.Module, cfg: DictConfig) -> torch.optim.Optimizer:
if "weight_decay" in cfg:
no_decay = ["bias", "LayerNorm.weight"]
grouped_parameters = [
{
"params": [
p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay) and p.requires_grad
],
"weight_decay": cfg.weight_decay,
},
{
"params": [
p for n, p in model.named_parameters() if any(nd in n for nd in no_decay) and p.requires_grad
],
"weight_decay": 0.0,
},
]
return self.instantiate(cfg, grouped_parameters)
return self.instantiate(cfg, filter(lambda p: p.requires_grad, model.parameters()))
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.