The Training of Interpolation Transformer

Question

The Training of Interpolation Transformer

Closed this issue 2 years ago · 1 comments

Dear author:

In the training of Interpolation Transformer, given the latent space is 5 * 16 * 16, I found the first 16 * 16 and the last 16 * 16 tokens join the gradient propagation. But in the inference of Interpolation Transformer, the first and last 16 * 16 tokens are given. So, in my opinion, the first 16 * 16 and the last 16 * 16 tokens should not take part in gradient back-propagation during the training process? Please correct me if I'm wrong.

Kang

Answer 1 · 2022-08-31T00:18:04.000Z

Hi Kang, good point. I think you are right. I suspect that if you mask out the loss on the initial and last frames, it shouldn't affect the model performance.