songweige/TATS

The Training of Interpolation Transformer

Closed this issue · 1 comments

Dear author:

In the training of Interpolation Transformer, given the latent space is 5 * 16 * 16, I found the first 16 * 16 and the last 16 * 16 tokens join the gradient propagation. But in the inference of Interpolation Transformer, the first and last 16 * 16 tokens are given. So, in my opinion, the first 16 * 16 and the last 16 * 16 tokens should not take part in gradient back-propagation during the training process? Please correct me if I'm wrong.

Kang

Hi Kang, good point. I think you are right. I suspect that if you mask out the loss on the initial and last frames, it shouldn't affect the model performance.