kohjingyu/fromage

Freezing the final linear layer when adding new token [RET]

ptirupat opened this issue · 1 comments

Hello,

Thank you for releasing the code for your paper. It is fascinating work. I have one question specific to the implementation.

When the [RET] token is added, the embedding layer is updated along with the final classification layer. Specifically, the output dimension of the FC layer is updated to 32001. However, you freeze all the layers in LLM. How does this work during training when you have the next token prediction training?

The embedding matrix and the lm_head layer will be unfrozen when the LM token embeddings are resized. More details in #6 (comment)

Hope that helps!