Question about the IP2P+

Question

Question about the IP2P+

mengVV opened this issue 8 months ago · 1 comments

I am wondering how you implement the IP2P+, the mask derived from the difference between conditional and unconditional after first_stage_decode or at the latent space before first_stage_decode, could you please figure it out in detail?

Answer 1 · 2024-04-12T06:22:06.000Z

Note that IP2P+ only encodes once (at the begining stage) and decodes once (in the end). Frequent denoising and diffusion operate in latent space. Hence, the mask is derived in the laten space, and it is obtained from the first denoising result.