NVIDIA/NeMo

How to implement weight decay towards the pre-trained model?

sedol1339 opened this issue · 2 comments

Hello, let me one question.

If using NeMo for supervised fune-tuning, how do I implement penalizing the distance between starting and current weights? This was shown to be effective in https://arxiv.org/abs/1706.03610