How can we change encoder?
GrigorySamokhin opened this issue · 13 comments
Hello! Thank you for your work!
I'm wondering how I can replace the encoder in the current implementation with https://github.com/eladrich/pixel2style2pixel encoder. First image "this implementation", Second image "pixel2style2pixel" implementation.
If I submit a latent vector from pixel2style2pixel, I get differentoutput.
But it is written in the README that it is possible to replace the encoder.
Thanks for your help!
Since our pSp uses Z+ latent code, while the original pSp uses W+ latent code,
when replacing our pSp with the original one, you should modify
DualStyleGAN/style_transfer.py
Lines 103 to 104 in 39a9e9e
to
# the original pSp has no options of z_plus_latent and return_z_plus_latent
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False)
and modify
DualStyleGAN/style_transfer.py
Lines 126 to 127 in 39a9e9e
to
# input_is_latent=True indicates that the input content code is in W+ space
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)
Hope this can solve your problem.
By the way, you can set use_res=False in
DualStyleGAN/style_transfer.py
Lines 126 to 127 in 39a9e9e
This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.
Thanks for your fast reply! I will try and come back to you!
It works, thanks!
Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.
Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.
My code just uses the encoder from https://github.com/eladrich/pixel2style2pixel
For new encoder, you should find all .py files that have from model.encoder.psp import pSp
in the beginning.
And change the encoder (encoder = pSp(opts).to(device).eval()
) with your new encoder.
And modify all the codes like _, stylecode = encoder(XXXX)
in the file to match the way your encoder encoding an image.
Thanks for your fast reply! But since your pSp uses Z+ latent code, isn't it different from the original one here https://github.com/eladrich/pixel2style2pixel? Or we can just use the encoder you provided and simply modify style_transfer.py
as you implied above to use W+ latent code?
Below is the result I get when I try other encoders. I don't know where that male face comes from... Do you know what might be the reason?
I saw you saying that "If it differs a lot from the original one, you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator." May you explain how to fix that? Thanks.
All z+ codes will be transformed to w+ codes before sending to the generator
You just need to remove this transformation if you directly feed the w+ codes.
You need to check the code and find these transformations and made modifications.
for example, change
DualStyleGAN/style_transfer.py
Line 113 in 998a9ef
to
exstyle = latent
And note that the options of input_is_latent
and z_plus_latent
DualStyleGAN/style_transfer.py
Lines 121 to 129 in 998a9ef
For w+ code, you should use input_is_latent=True
and z_plus_latent=False
Thanks for your clear explanation. Unfortunately, I'm still having the same male face. Below are all the changes I have made. Please let me know if I miss anything. (I'm just trying to use the same encoder in W+)
DualStyleGAN/style_transfer.py
Lines 102 to 104 in 998a9ef
to
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False)
Now the 'instyle' should be w+ right?
DualStyleGAN/style_transfer.py
Line 113 in 998a9ef
to
exstyle = latent
This step is to remove the translation from z+ to w+ because the encoder is already returning w+.
DualStyleGAN/style_transfer.py
Lines 126 to 127 in 998a9ef
to
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=False, interp_weights=args.weight)
This indicates that the input content code is in W+ space. For this step, I also tried using StyleGAN without
exstyle
and truncation=1
but none of them change the result.I my previous answer, I just gives two example where you should modify but they are not the only codes to be modified.
You need to check all the codes about the encoder to make modification besides these two parts.
And you can also check the reconstruction before do stylization:
By the way, you can set use_res=False in
DualStyleGAN/style_transfer.py
Lines 126 to 127 in 39a9e9e
This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.
Thanks for your fast reply again! Theoretically, I think I understand what you said, but maybe because I just start to learn some CV this month so it's kind of hard for me to recognize it. I will look into the code more. Just want to confirm two more things:
- Is there any other file I need to modify other than "style_transfer.py"? To my understanding, the encoder is returning W+ as it is and all of the transformations between z+ and w+ are in "styke_transfer.py"?
- This line is the only line that is using the encoder right?
DualStyleGAN/style_transfer.py
Lines 102 to 104 in 998a9ef
And by changing it toimg_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False)
shouldn't we already have the 'img_rec' and 'instyle' in W+? (this is the part that confuses me because the original encoder should return the reconstructed face in w+ here already and I'm not sure why the reconstructed face still differs a lot from the original one)
Then because we have 'instyle' in w+ already, so we don't need to convert 'exstyle' (and other transformations that I couldn't find) to w+ again. We could directly feed the w+ code to the generator.
if you only test the code, not train the model, then you only need to modify style_transfer.py and psp.py (change to your W+ encoder)
exstyle is always the W+ code when feeding into the generator.
You are suggested to read the code of
DualStyleGAN/model/encoder/psp.py
Lines 106 to 112 in 39a9e9e
It's much clear than my explanation.
I --> z+ --> w+ --> img_rec
The difference between return_z_plus_latent
is whether return z+ or w+, they all lead to the same img_rec.
If you use my encoder, you should always use z_plus_latent=True
, this is for the generator.
Finally, I cannot debug for you.
Thanks for your reply. I will go back and check the encoder. Sorry for making you debug for me though I don't mean to. Again, thanks a lot for your patient and help!