Some question about the code of textsliders "train_lora_xl.py"

Question

Some question about the code of textsliders "train_lora_xl.py"

yatoubusha opened this issue 7 months ago · 0 comments

I observed that during the training process, firstly, based on the Lora structure, we infer denoised_latents from randomly initialized latents,

Then, based on denoised_latents and the frozen SD structure, continue to predict noise? denoised_latents is already the denoised image, what is the principle of predicting noise again? Why not predict noise for randomly initialized latents?