Can you please explain how the parameters: loss_weight_feat,loss_weight_enc and loss_weight_clip are helpful ?
srinivaspavan9 opened this issue · 6 comments
So I would like to how the above-stated parameters impact the final generated image. Can you please explain?
The generation and manipulation process is almost the same except where the latent codes come from. That means the generation can be seen as the manipulation. So the problem is how to change the desired attributes or regions of the given image according to the given texts and leave the others unchanged (such constraint of generation can be relaxed which means you can actually change the undesired attributes since there could be infinite results for generation).
The first goal change the desired attributes or regions is actually controlled by the clip loss (loss_weight_clip). This loss leads the optimization to obtained an image that is consistent with the texts. The other losses (loss_weight_feat and loss_weight_enc) tries to keep the others unchanged, or put it in other words, faithfully reconstruct the other regions. You should have noticed that in different cases, the two kinds of losses are playing the game differently. In most cases, the losses that are responsible for the second goal are dominant. The primary goal of such optimization is to reconstruct the images and change some attributes incidentally. For other cases, there is a border it has to cross over. For example, when editing some attributes, e.g. adding eyeglasses, you have to make the clip loss in charge during the optimization and try to reconstruct the rest regions.
The generation and manipulation process is almost the same except where the latent codes come from. That means the generation can be seen as the manipulation. So the problem is how to change the desired attributes or regions of the given image according to the given texts and leave the others unchanged (such constraint of generation can be relaxed which means you can actually change the undesired attributes since there could be infinite results for generation).
The first goal change the desired attributes or regions is actually controlled by the clip loss (loss_weight_clip). This loss leads the optimization to obtained an image that is consistent with the texts. The other losses (loss_weight_feat and loss_weight_enc) tries to keep the others unchanged, or put it in other words, faithfully reconstruct the other regions. You should have noticed that in different cases, the two kinds of losses are playing the game differently. In most cases, the losses that are responsible for the second goal are dominant. The primary goal of such optimization is to reconstruct the images and change some attributes incidentally. For other cases, there is a border it has to cross over. For example, when editing some attributes, e.g. adding eyeglasses, you have to make the clip loss in charge during the optimization and try to reconstruct the rest regions.
So do these parameters change their values while generating the final output? or they act as values for initializing the weights?
It is a process of optimization rather than training. Such optimization is image-specific (as mentioned in the Limitation Section), which means that these parameters may be different to obtain a desired output.
I understood that it is a process of optimization and also I observed that the weights are changing while generating an image. does that mean that the weights (loss_feat,loss_reg,loss_clip) are getting to their optimal values during the generation of the image. If so? why initialize them with specific values such as "loss_weight_clip=2.0", why not initialize them with random values in their domain. Am I thinking right? I still need clarity on this. Please help
Those default values are empirically chosen. The values in the screenshot you posted are the values of losses, not the weights. The weights are fixed when you run the python scripts.
I knew such image-specific optimization is not elegant and maybe you can find a better way to obtain the desired latent codes with the guidance of texts.
Those default values are empirically chosen. The values in the screenshot you posted are the values of losses, not the weights. The weights are fixed when you run the python scripts.
I knew such image-specific optimization is not elegant and maybe you can find a better way to obtain the desired latent codes with the guidance of texts.
okay, got it thanks