How to implement weight decay towards the pre-trained model?
sedol1339 opened this issue · 2 comments
sedol1339 commented
Hello, let me one question.
If using NeMo for supervised fune-tuning, how do I implement penalizing the distance between starting and current weights? This was shown to be effective in https://arxiv.org/abs/1706.03610