Some question about the code of textsliders "train_lora_xl.py"
Opened this issue · 0 comments
yatoubusha commented
I observed that during the training process, firstly, based on the Lora structure, we infer denoised_latents from randomly initialized latents,
Then, based on denoised_latents and the frozen SD structure, continue to predict noise? denoised_latents is already the denoised image, what is the principle of predicting noise again? Why not predict noise for randomly initialized latents?