Is diffusion position embedding necessary?

Thanks for your excellent work!
I've seen that before the conditioning vector z entering diffusion model, positional embedding added to it, as the code below:

mar/models/mar.py

Line 229 in e0cccf8

x = x + self.diffusion_pos_embed_learned

Is this crucial? any explanation can be show? I'm really appreciate that.

Thanks for your interest! Actually this position embedding is not a crucial part 😂. I once thought we need to tell the DiffLoss which position it is generating, so I added this position embedding. But later I found that the condition z should already contain the position information because of the position embedding added at the beginning of the decoder. However, since all of our pre-trained models are trained with self.diffusion_pos_embed_learned, I just keep it in the code.

I found that the condition z should already contain the position information because of the position embedding added at the beginning of the decoder.

That's exactly what I thought! Thanks for your reply.