Transfer learning from a Nvida's Pre-trained StyleGAN (FFHQ)

Hi,

Utilize the pre-trained pkl file: https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl. I've attempted to transfer learning (without augmentation) from (FFHQ->CelebA-HQ).

python train.py --outdir=~/training-runs --data=~/datasets/FFHQ/GenderTrainSamples_0.025.zip --gpus=1 --workers 1 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl --aug=noaug --kimg 200

However, when looking a the init generated images, I see this:

but when checking the FID against the FFHQ dataset FID=~9.

Can anyone explain what is going?

Hi, I have faced the same problem and found out that the default configuration on the model is different from the pre-trained one.

Solution

Change the configuration from --cfg=auto (default) to --cfg=paper256 for this pre-trained model.
(For other pre-trained models, use the same model configuration as they were trained)

fakes_init.png with --cfg=auto:

fakes_init.png with --cfg=paper256:

Explanation

The configuration controls the model's channel_base, the number of the mapping network's layers, and the minibatch standard deviation layer of the discriminator. For instance, the mapping network has 8 layers with --cfg=paper256, while it has only 2 layers with --cfg=auto.

To keep the model structure the same as the pre-trained one, you should ensure that fmaps, map, and mdstd in the configuration are the same as it was trained.

stylegan2-ada-pytorch/train.py

Lines 154 to 161 in 6f160b3

    
           cfg_specs = { 
        
               'auto':      dict(ref_gpus=-1, kimg=25000,  mb=-1, mbstd=-1, fmaps=-1,  lrate=-1,     gamma=-1,   ema=-1,  ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count. 
        
               'stylegan2': dict(ref_gpus=8,  kimg=25000,  mb=32, mbstd=4,  fmaps=1,   lrate=0.002,  gamma=10,   ema=10,  ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2. 
        
               'paper256':  dict(ref_gpus=8,  kimg=25000,  mb=64, mbstd=8,  fmaps=0.5, lrate=0.0025, gamma=1,    ema=20,  ramp=None, map=8), 
        
               'paper512':  dict(ref_gpus=8,  kimg=25000,  mb=64, mbstd=8,  fmaps=1,   lrate=0.0025, gamma=0.5,  ema=20,  ramp=None, map=8), 
        
               'paper1024': dict(ref_gpus=8,  kimg=25000,  mb=32, mbstd=4,  fmaps=1,   lrate=0.002,  gamma=2,    ema=10,  ramp=None, map=8), 
        
               'cifar':     dict(ref_gpus=2,  kimg=100000, mb=64, mbstd=32, fmaps=1,   lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2), 
        
           }

stylegan2-ada-pytorch/train.py

Lines 176 to 183 in 6f160b3

    
           args.G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', z_dim=512, w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict()) 
        
           args.D_kwargs = dnnlib.EasyDict(class_name='training.networks.Discriminator', block_kwargs=dnnlib.EasyDict(), mapping_kwargs=dnnlib.EasyDict(), epilogue_kwargs=dnnlib.EasyDict()) 
        
           args.G_kwargs.synthesis_kwargs.channel_base = args.D_kwargs.channel_base = int(spec.fmaps * 32768) 
        
           args.G_kwargs.synthesis_kwargs.channel_max = args.D_kwargs.channel_max = 512 
        
           args.G_kwargs.mapping_kwargs.num_layers = spec.map 
        
           args.G_kwargs.synthesis_kwargs.num_fp16_res = args.D_kwargs.num_fp16_res = 4 # enable mixed-precision training 
        
           args.G_kwargs.synthesis_kwargs.conv_clamp = args.D_kwargs.conv_clamp = 256 # clamp activations to avoid float16 overflow 
        
           args.D_kwargs.epilogue_kwargs.mbstd_group_size = spec.mbstd

In addition, when loading the pre-trained model, the function copy_params_and_buffers would ignore the unexpected parameters in the pre-trained model without informing of such inconsistency.

stylegan2-ada-pytorch/torch_utils/misc.py

Lines 153 to 160 in 6f160b3

    
           def copy_params_and_buffers(src_module, dst_module, require_all=False): 
        
               assert isinstance(src_module, torch.nn.Module) 
        
               assert isinstance(dst_module, torch.nn.Module) 
        
               src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)} 
        
               for name, tensor in named_params_and_buffers(dst_module): 
        
                   assert (name in src_tensors) or (not require_all) 
        
                   if name in src_tensors: 
        
                       tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

Hi, I have faced the same problem and found out that the default configuration on the model is different from the pre-trained one.

Solution

Change the configuration from --cfg=auto (default) to --cfg=paper256 for this pre-trained model. (For other pre-trained models, use the same model configuration as they were trained)

fakes_init.png with --cfg=auto:

fakes_init.png with --cfg=paper256:

Explanation

The configuration controls the model's channel_base, the number of the mapping network's layers, and the minibatch standard deviation layer of the discriminator. For instance, the mapping network has 8 layers with --cfg=paper256, while it has only 2 layers with --cfg=auto.

To keep the model structure the same as the pre-trained one, you should ensure that fmaps, map, and mdstd in the configuration are the same as it was trained.

stylegan2-ada-pytorch/train.py

Lines 154 to 161 in 6f160b3

cfg_specs = {

'auto': dict(ref_gpus=-1, kimg=25000, mb=-1, mbstd=-1, fmaps=-1, lrate=-1, gamma=-1, ema=-1, ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count.

'stylegan2': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2.

'paper256': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=0.5, lrate=0.0025, gamma=1, ema=20, ramp=None, map=8),

'paper512': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.0025, gamma=0.5, ema=20, ramp=None, map=8),

'paper1024': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=2, ema=10, ramp=None, map=8),

'cifar': dict(ref_gpus=2, kimg=100000, mb=64, mbstd=32, fmaps=1, lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2),

}

stylegan2-ada-pytorch/train.py

Lines 176 to 183 in 6f160b3

args.G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', z_dim=512, w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict())

args.D_kwargs = dnnlib.EasyDict(class_name='training.networks.Discriminator', block_kwargs=dnnlib.EasyDict(), mapping_kwargs=dnnlib.EasyDict(), epilogue_kwargs=dnnlib.EasyDict())

args.G_kwargs.synthesis_kwargs.channel_base = args.D_kwargs.channel_base = int(spec.fmaps * 32768)

args.G_kwargs.synthesis_kwargs.channel_max = args.D_kwargs.channel_max = 512

args.G_kwargs.mapping_kwargs.num_layers = spec.map

args.G_kwargs.synthesis_kwargs.num_fp16_res = args.D_kwargs.num_fp16_res = 4 # enable mixed-precision training

args.G_kwargs.synthesis_kwargs.conv_clamp = args.D_kwargs.conv_clamp = 256 # clamp activations to avoid float16 overflow

args.D_kwargs.epilogue_kwargs.mbstd_group_size = spec.mbstd

In addition, when loading the pre-trained model, the function copy_params_and_buffers would ignore the unexpected parameters in the pre-trained model without informing of such inconsistency.

stylegan2-ada-pytorch/torch_utils/misc.py

Lines 153 to 160 in 6f160b3

def copy_params_and_buffers(src_module, dst_module, require_all=False):

assert isinstance(src_module, torch.nn.Module)

assert isinstance(dst_module, torch.nn.Module)

src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}

for name, tensor in named_params_and_buffers(dst_module):

assert (name in src_tensors) or (not require_all)

if name in src_tensors:

tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

@JackywithaWhiteDog Hi. Where did this weight come from? I only get 256*256 pretrained weight in transfer-lerarning folder. The website is different from the one in this issue.

Hi @githuboflk, sorry that I didn't notice your question. I also used the pre-trained weight provided in README as you mentioned.

However, I think the checkpoint in this issue is available at NVIDIA NGC Catalog.

	cfg_specs = {
	'auto': dict(ref_gpus=-1, kimg=25000, mb=-1, mbstd=-1, fmaps=-1, lrate=-1, gamma=-1, ema=-1, ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count.
	'stylegan2': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2.
	'paper256': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=0.5, lrate=0.0025, gamma=1, ema=20, ramp=None, map=8),
	'paper512': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.0025, gamma=0.5, ema=20, ramp=None, map=8),
	'paper1024': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=2, ema=10, ramp=None, map=8),
	'cifar': dict(ref_gpus=2, kimg=100000, mb=64, mbstd=32, fmaps=1, lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2),
	}

	args.G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', z_dim=512, w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict())
	args.D_kwargs = dnnlib.EasyDict(class_name='training.networks.Discriminator', block_kwargs=dnnlib.EasyDict(), mapping_kwargs=dnnlib.EasyDict(), epilogue_kwargs=dnnlib.EasyDict())
	args.G_kwargs.synthesis_kwargs.channel_base = args.D_kwargs.channel_base = int(spec.fmaps * 32768)
	args.G_kwargs.synthesis_kwargs.channel_max = args.D_kwargs.channel_max = 512
	args.G_kwargs.mapping_kwargs.num_layers = spec.map
	args.G_kwargs.synthesis_kwargs.num_fp16_res = args.D_kwargs.num_fp16_res = 4 # enable mixed-precision training
	args.G_kwargs.synthesis_kwargs.conv_clamp = args.D_kwargs.conv_clamp = 256 # clamp activations to avoid float16 overflow
	args.D_kwargs.epilogue_kwargs.mbstd_group_size = spec.mbstd

	def copy_params_and_buffers(src_module, dst_module, require_all=False):
	assert isinstance(src_module, torch.nn.Module)
	assert isinstance(dst_module, torch.nn.Module)
	src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}
	for name, tensor in named_params_and_buffers(dst_module):
	assert (name in src_tensors) or (not require_all)
	if name in src_tensors:
	tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)