yuval-alaluf/stylegan3-editing

methodological question

NicolasDibot opened this issue · 2 comments

Hi

Everything works fine in the code for me (thank you! ) but I have a methodological question for which I could not find an answer even after reading the article.

To get the boundaries for editing, the SVM is trained on the encodings of randomly generated images sampled from the 512-d latent space. However, the encoding of a real image encoded with PsP is of size 16*512. From what I understood after reading the code, to edit an image, the same transformation (using the weights obtained from the SVM) is applied to each of the16 different latent vectors of the encoded image. Is this correct ? I have a hard time understanding how this can work because, to take the example of modifying femininity of a face, this means that encodings are shifted toward the same direction for all styles (ie the direction is the same for skin texture and overall shape), which seems weird to me.

parenthesis for the context: I'm a researcher in animal behavior, and I want to modify the femininity of monkey faces to see if it influences their attractiveness when these images are shown to real monkeys. But I first need to determine a scale of femininity editing based on the natural variation in femininity observed in these monkeys (eg. editing a female face that is among the 1-25% less feminine to be among the 75-100% most feminine). I need to understand on which of the 16 latent vectors (or on all of them ?) I have to calculate this scale.

Your understanding is definitely correct :)
Basically, we change all 16 vectors in the same direction defined by the editing direction.
If you know that you want to change age, what you can do perhaps is only change a subset of the vectors. StyleGAN divides its layers into coarse, medium, and fine layers where the fine layers mainly affect the color and lighting of the generating image. If you are trying to change something like gender, then you can try only altering the vectors related to the coarse and medium layers. I believe that these layers are 0 to 7, but you can double check me by referring to the StyleGAN2 paper.
Hope this helps.

@NicolasDibot - you may get some bandwidth from sad talker - https://replicate.com/cjwbw/sadtalker