Question about the IP2P+
mengVV opened this issue · 1 comments
mengVV commented
I am wondering how you implement the IP2P+, the mask derived from the difference between conditional and unconditional after first_stage_decode or at the latent space before first_stage_decode, could you please figure it out in detail?
Ysz2022 commented
Note that IP2P+ only encodes once (at the begining stage) and decodes once (in the end). Frequent denoising and diffusion operate in latent space. Hence, the mask is derived in the laten space, and it is obtained from the first denoising result.