did it doenst need backpropagation process？

Question

did it doenst need backpropagation process？

Opened this issue 3 years ago · 4 comments

I dont find any backpropagation process in the code. I'm curious about how the stochastic embedding be optimized. I wonder Am I misunderstanding the paper or the code is uncompeleted? Thanks for answer my question

Answer 1 · 2021-09-06T12:55:14.000Z

Yes, I find this problem too. When updating the model, the learned embedding remains unchanged ! I also wonder how to update the soft embedding...

Answer 2 · 2021-11-03T03:41:54.000Z

I'd also like to know this. As-is it looks like the code just feeds the original embedding (or another model's embedding) back into the model, which doesn't sound right.

Answer 3 · 2021-11-03T14:41:36.000Z

PyTorch handles all the backpropagation process, you just need to specify which parameters you want to update.

model.set_input_embeddings(s_wte)
#after updating the embedding, specify that you want to train the learned embedding
optimizer = optim.Adam([model.transformer.wte.learned_embedding])

Also, I'm not passing a reference to the original embedding, just initializing the learning embedding to the original embedding and cloning the weights (hopefully for a better initialization), the paper does it somewhat differently, but I think it's the same idea.

Answer 4 · 2021-11-11T03:06:34.000Z

I think it is better to freeze some parameters to reduce gradient computations. Use something like https://discuss.huggingface.co/t/how-to-freeze-layers-using-trainer/4702/3