kohjingyu/fromage

Question about the frozen language model

sijeh opened this issue · 3 comments

sijeh commented

if self.args.freeze_lm:

Hello kohjingyu, thanks for your great work!
I'm a little confused about the frozen LLM model. It seems all the parameters in LLM are frozen. Should the input_embedding.weight in the [RET] position be learnable? I could not find such code as self.input_embedding.requires_grad_(True) or self.input_embedding.weight.requires_grad = True. On the other hand, I see the gradient of input_embedding is adjusted in
param.grad[mask, :] = 0
Please point out if I neglect some important information.

Best regards.

Thanks for bringing this up, as it is a subtle point. When the token embeddings are resized (to include the extra [RET] token):

self.lm.resize_token_embeddings(len(tokenizer))

requires_grad is automatically set to true for them (you can verify this during training when it prints out the params and whether they are trainable). This is why we have to zero out the gradients of the non-[RET] tokens in the training loop, so prevent them from changing. It's quite complicated to make just the [RET] embedding row trainable, so I elected to this instead.

Hope that answers your question!

sijeh commented

That's the exact point I have overlooked. Thanks for your kind reply.