tonyzhaozh/act

DETR VEA uses output of first layer of transformer decoder?

Opened this issue · 3 comments

In

hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0]
and
hs = self.transformer(transformer_input, None, self.query_embed.weight, self.pos.weight)[0]
you index the output of the transformer with [0]. Does this not take the output of the first layer of the transformer decoder, instead of the last layer? And is this behaviour expected?

Agree. I removed subsequent layers and it had no effect on training.

uuu686 commented

@CarlDegio hello,I changed 0 to -1 and used the output of the last layer, but there is no obvious difference in the effect. May I ask that your experiment is effective?

I did not try the effect of multi-layer decoder. I just removed the decoder forward propagation that was not used in the original code to speed up the training. @uuu686