Lightning-Universe/lightning-transformers

Optimizers without `weight_decay` produce errors

pietrolesci opened this issue · 1 comments

Hi there,

I am trying switching optimizer in the following example in the README

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1

to

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1 optimizer=adamax

and I get the error that the weight_decay configuration is not present. I think the source of the error is

https://github.com/PyTorchLightning/lightning-transformers/blob/aa8f48addc9d16733cfb7572cf19cba17cad29a6/lightning_transformers/core/instantiator.py#L81

As a fix, I did the following locally

 def optimizer(self, model: torch.nn.Module, cfg: DictConfig) -> torch.optim.Optimizer:
        if "weight_decay" in cfg:
            no_decay = ["bias", "LayerNorm.weight"]
            grouped_parameters = [
                {
                    "params": [
                        p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": cfg.weight_decay,
                },
                {
                    "params": [
                        p for n, p in model.named_parameters() if any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": 0.0,
                },
            ]
            return self.instantiate(cfg, grouped_parameters)
        
        return self.instantiate(cfg, filter(lambda p: p.requires_grad, model.parameters()))
stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.