Working with images and text embeddings of different shape

Question

Working with images and text embeddings of different shape

AhmedGamal411 opened this issue 2 years ago · 0 comments

I was wondering if I can use latents from a VAE as an input to Imagen UNet, just like latent diffusion models. But the issue is that VAE change the shape of the image (e.g. 1 Dimentional array or 4 channel images). What do I have to change to be able to do that?

I was also wondering about changing the text embeddings used. The issue is also that they might have different shapes. Is it feasible to use other embeddings with minimal code change?

Thank You!