williamyang1991/DualStyleGAN

How can we change encoder?

GrigorySamokhin opened this issue · 13 comments

Hello! Thank you for your work!

I'm wondering how I can replace the encoder in the current implementation with https://github.com/eladrich/pixel2style2pixel encoder. First image "this implementation", Second image "pixel2style2pixel" implementation.
Screenshot 2022-03-31 at 14 55 00
Screenshot 2022-03-31 at 14 55 49

If I submit a latent vector from pixel2style2pixel, I get differentoutput.
Screenshot 2022-03-31 at 14 58 26

But it is written in the README that it is possible to replace the encoder.

Thanks for your help!

Since our pSp uses Z+ latent code, while the original pSp uses W+ latent code,
when replacing our pSp with the original one, you should modify

img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
z_plus_latent=True, return_z_plus_latent=True, resize=False)

to

# the original pSp has no options of z_plus_latent and return_z_plus_latent
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False)

and modify

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

to

# input_is_latent=True indicates that the input content code is in W+ space
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

Hope this can solve your problem.

By the way, you can set use_res=False in

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.

Thanks for your fast reply! I will try and come back to you!

It works, thanks!

Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.

Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.

My code just uses the encoder from https://github.com/eladrich/pixel2style2pixel

For new encoder, you should find all .py files that have from model.encoder.psp import pSp in the beginning.
And change the encoder (encoder = pSp(opts).to(device).eval()) with your new encoder.
And modify all the codes like _, stylecode = encoder(XXXX) in the file to match the way your encoder encoding an image.

Thanks for your fast reply! But since your pSp uses Z+ latent code, isn't it different from the original one here https://github.com/eladrich/pixel2style2pixel? Or we can just use the encoder you provided and simply modify style_transfer.py as you implied above to use W+ latent code?

Below is the result I get when I try other encoders. I don't know where that male face comes from... Do you know what might be the reason?
Screen Shot 2022-12-11 at 10 19 49 PM

I saw you saying that "If it differs a lot from the original one, you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator." May you explain how to fix that? Thanks.

All z+ codes will be transformed to w+ codes before sending to the generator
You just need to remove this transformation if you directly feed the w+ codes.
You need to check the code and find these transformations and made modifications.

for example, change

exstyle = generator.generator.style(latent.reshape(latent.shape[0]*latent.shape[1], latent.shape[2])).reshape(latent.shape)

to

exstyle = latent

And note that the options of input_is_latent and z_plus_latent

# style transfer
# input_is_latent: instyle is not in W space
# z_plus_latent: instyle is in Z+ space
# use_res: use extrinsic style path, or the style is not transferred
# interp_weights: weight vector for style combination of two paths
img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)
img_gen = torch.clamp(img_gen.detach(), -1, 1)
viz += [img_gen]

For w+ code, you should use input_is_latent=True and z_plus_latent=False

Thanks for your clear explanation. Unfortunately, I'm still having the same male face. Below are all the changes I have made. Please let me know if I miss anything. (I'm just trying to use the same encoder in W+)

# reconstructed content image and its intrinsic style code
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
z_plus_latent=True, return_z_plus_latent=True, resize=False)

to
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False) Now the 'instyle' should be w+ right?

exstyle = generator.generator.style(latent.reshape(latent.shape[0]*latent.shape[1], latent.shape[2])).reshape(latent.shape)

to
exstyle = latent This step is to remove the translation from z+ to w+ because the encoder is already returning w+.

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

to
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=False, interp_weights=args.weight)
This indicates that the input content code is in W+ space. For this step, I also tried using StyleGAN without exstyle and truncation=1 but none of them change the result.

I my previous answer, I just gives two example where you should modify but they are not the only codes to be modified.
You need to check all the codes about the encoder to make modification besides these two parts.

And you can also check the reconstruction before do stylization:

By the way, you can set use_res=False in

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.

Thanks for your fast reply again! Theoretically, I think I understand what you said, but maybe because I just start to learn some CV this month so it's kind of hard for me to recognize it. I will look into the code more. Just want to confirm two more things:

  1. Is there any other file I need to modify other than "style_transfer.py"? To my understanding, the encoder is returning W+ as it is and all of the transformations between z+ and w+ are in "styke_transfer.py"?
  2. This line is the only line that is using the encoder right?
    # reconstructed content image and its intrinsic style code
    img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
    z_plus_latent=True, return_z_plus_latent=True, resize=False)

    And by changing it to img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False) shouldn't we already have the 'img_rec' and 'instyle' in W+? (this is the part that confuses me because the original encoder should return the reconstructed face in w+ here already and I'm not sure why the reconstructed face still differs a lot from the original one)
    Then because we have 'instyle' in w+ already, so we don't need to convert 'exstyle' (and other transformations that I couldn't find) to w+ again. We could directly feed the w+ code to the generator.

if you only test the code, not train the model, then you only need to modify style_transfer.py and psp.py (change to your W+ encoder)

exstyle is always the W+ code when feeding into the generator.

You are suggested to read the code of

if return_latents:
if z_plus_latent and return_z_plus_latent:
return images, codes
if z_plus_latent and not return_z_plus_latent:
return images, result_latent
else:
return images, result_latent

It's much clear than my explanation.

I --> z+ --> w+ --> img_rec
The difference between return_z_plus_latent is whether return z+ or w+, they all lead to the same img_rec.
If you use my encoder, you should always use z_plus_latent=True, this is for the generator.

Finally, I cannot debug for you.

Thanks for your reply. I will go back and check the encoder. Sorry for making you debug for me though I don't mean to. Again, thanks a lot for your patient and help!