what's the function of the learnable positional_embedding in the class DiffusionSceneLayout_DDPM?
Opened this issue · 3 comments
when reading training code, I found a learnable positional_embedding which is passed to the first block of Unet1D's downs, mid_blocks, ups.
such as the code of Unet1D's downs:
for block0, block1, attncross, block2, attn, downsample in self.downs:
x = block0(x, context)
x = block1(x, t)
h.append(x)
x = attncross(x, context_cross) if self.text_condition else attncross(x)
x = block2(x, t)
x = attn(x)
h.append(x)
x = downsample(x)
the context is the instan_condition_f from the next code:
instance_indices = torch.arange(self.sample_num_points).long().to(self.device)[None, :].repeat(batch_size, 1)
instan_condition_f = self.positional_embedding[instance_indices, :]
I wonder the function of positional_embedding. thanks for your help.
In the initial implementation of Unet1D from 'https://github.com/lucidrains/denoising-diffusion-pytorch/blob/main/denoising_diffusion_pytorch/denoising_diffusion_pytorch_1d.py', there is no block0 and attncross. is that one of the innovations of this paper?
the instance embedding is to encode the position information of each instance within a sequence.
It can helps the denoiser differentiate different object instances.
Thank you very much for your help!