
pre-trained models

chenxwh opened this issue · 10 comments

Hi, are the pre-trained models released here ready to be used for any customised content/style image pairs? Thanks!

Yes, all the released models are designed for arbitrary style transfer.

Hi @JarrentWu1031, thanks for the greate job and share all the pretrained models and codes. Does the only required model for training is vgg_normalized.pth? Could you share some informations on how to train this encoder?

Yes, the only pre-trained model involved here is the vgg model. It is trained on ImageNet for classification. If you are interested, you could train your own encoder from scratch. You could refer to the original paper of VGG.

Hi @JarrentWu1031 , I have tried to retrain the encoder, but the result looks not good. Do you have some recommanded encoder codes for this? thanks.

In this paper, we didn't train our own encoder. Instead we use the pre-trained model from 'https://drive.google.com/file/d/1EpkBA2K2eYILDSyPTt0fztz59UjAIpZU/view?usp=sharing' as previous works do. If you really want to retrain your own encoder, you could google like 'VGG ImageNet classification'. I think it's ok for you to use the pre-trained model as well. Our work aims to train the SCT module and decoder, not including the encoder (we assume the extracted features are meaningful).

Hi @JarrentWu1031 I retrained the vgg net with pytorch on imagenet classification task, and the model converged. But it does not work for style transfer. I dumped the vgg_normalised.pt's weights, and its first conv weights is:

<style> </style>
0.weight 0.bias
0 -103.939
0 -116.779
255 -123.68

Do you know why?

Hi Zheng, I might misunderstand you in the very beginning. In fact, the entire stylization network in this work consists of three parts: an VGG encoder, a decoder and a SCT module (proposed in this paper). What need to be trained are the decoder and the SCT module to fulfill stylization, while the encoder (pre-trained on ImageNet) does not need to be trained.


Yes, I understand the whole pipeline, and I am trying to train a light-weighted network to get real-time inference. But I found that the vgg based encoder took much time, that why I am trying to retrain the encoder.

Well, that is strange. It seems the provided VGG model shares the same weights with yours. Have you fix the parameters of your pre-trained encoder during style transfer training?

Well, that is strange. It seems the provided VGG model shares the same weights with yours. Have you fix the parameters of your pre-trained encoder during style transfer training?

Yes, I have fixed the encoder's parameters. Seems the provided vgg's weights have be normalized by the imagenet dataset. I found that lots of arbitrary style transfer methodologies used this normalized vgg weights.