NVlabs/stylegan2-ada-pytorch

Transfer learning from a Nvida's Pre-trained StyleGAN (FFHQ)

Bearwithchris opened this issue · 3 comments

Hi,

Utilize the pre-trained pkl file: https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl. I've attempted to transfer learning (without augmentation) from (FFHQ->CelebA-HQ).

python train.py --outdir=~/training-runs --data=~/datasets/FFHQ/GenderTrainSamples_0.025.zip --gpus=1 --workers 1 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl --aug=noaug --kimg 200

However, when looking a the init generated images, I see this:
image

but when checking the FID against the FFHQ dataset FID=~9.

Can anyone explain what is going?

Hi, I have faced the same problem and found out that the default configuration on the model is different from the pre-trained one.

Solution

Change the configuration from --cfg=auto (default) to --cfg=paper256 for this pre-trained model.
(For other pre-trained models, use the same model configuration as they were trained)

fakes_init.png with --cfg=auto:

fakes_init_auto

fakes_init.png with --cfg=paper256:

fakes_init_paper256

Explanation

The configuration controls the model's channel_base, the number of the mapping network's layers, and the minibatch standard deviation layer of the discriminator. For instance, the mapping network has 8 layers with --cfg=paper256, while it has only 2 layers with --cfg=auto.

To keep the model structure the same as the pre-trained one, you should ensure that fmaps, map, and mdstd in the configuration are the same as it was trained.

cfg_specs = {
'auto': dict(ref_gpus=-1, kimg=25000, mb=-1, mbstd=-1, fmaps=-1, lrate=-1, gamma=-1, ema=-1, ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count.
'stylegan2': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2.
'paper256': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=0.5, lrate=0.0025, gamma=1, ema=20, ramp=None, map=8),
'paper512': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.0025, gamma=0.5, ema=20, ramp=None, map=8),
'paper1024': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=2, ema=10, ramp=None, map=8),
'cifar': dict(ref_gpus=2, kimg=100000, mb=64, mbstd=32, fmaps=1, lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2),
}

args.G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', z_dim=512, w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict())
args.D_kwargs = dnnlib.EasyDict(class_name='training.networks.Discriminator', block_kwargs=dnnlib.EasyDict(), mapping_kwargs=dnnlib.EasyDict(), epilogue_kwargs=dnnlib.EasyDict())
args.G_kwargs.synthesis_kwargs.channel_base = args.D_kwargs.channel_base = int(spec.fmaps * 32768)
args.G_kwargs.synthesis_kwargs.channel_max = args.D_kwargs.channel_max = 512
args.G_kwargs.mapping_kwargs.num_layers = spec.map
args.G_kwargs.synthesis_kwargs.num_fp16_res = args.D_kwargs.num_fp16_res = 4 # enable mixed-precision training
args.G_kwargs.synthesis_kwargs.conv_clamp = args.D_kwargs.conv_clamp = 256 # clamp activations to avoid float16 overflow
args.D_kwargs.epilogue_kwargs.mbstd_group_size = spec.mbstd

In addition, when loading the pre-trained model, the function copy_params_and_buffers would ignore the unexpected parameters in the pre-trained model without informing of such inconsistency.

def copy_params_and_buffers(src_module, dst_module, require_all=False):
assert isinstance(src_module, torch.nn.Module)
assert isinstance(dst_module, torch.nn.Module)
src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}
for name, tensor in named_params_and_buffers(dst_module):
assert (name in src_tensors) or (not require_all)
if name in src_tensors:
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

Hi, I have faced the same problem and found out that the default configuration on the model is different from the pre-trained one.

Solution

Change the configuration from --cfg=auto (default) to --cfg=paper256 for this pre-trained model. (For other pre-trained models, use the same model configuration as they were trained)

fakes_init.png with --cfg=auto:

fakes_init_auto

fakes_init.png with --cfg=paper256:

fakes_init_paper256

Explanation

The configuration controls the model's channel_base, the number of the mapping network's layers, and the minibatch standard deviation layer of the discriminator. For instance, the mapping network has 8 layers with --cfg=paper256, while it has only 2 layers with --cfg=auto.

To keep the model structure the same as the pre-trained one, you should ensure that fmaps, map, and mdstd in the configuration are the same as it was trained.

cfg_specs = {
'auto': dict(ref_gpus=-1, kimg=25000, mb=-1, mbstd=-1, fmaps=-1, lrate=-1, gamma=-1, ema=-1, ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count.
'stylegan2': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2.
'paper256': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=0.5, lrate=0.0025, gamma=1, ema=20, ramp=None, map=8),
'paper512': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.0025, gamma=0.5, ema=20, ramp=None, map=8),
'paper1024': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=2, ema=10, ramp=None, map=8),
'cifar': dict(ref_gpus=2, kimg=100000, mb=64, mbstd=32, fmaps=1, lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2),
}

args.G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', z_dim=512, w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict())
args.D_kwargs = dnnlib.EasyDict(class_name='training.networks.Discriminator', block_kwargs=dnnlib.EasyDict(), mapping_kwargs=dnnlib.EasyDict(), epilogue_kwargs=dnnlib.EasyDict())
args.G_kwargs.synthesis_kwargs.channel_base = args.D_kwargs.channel_base = int(spec.fmaps * 32768)
args.G_kwargs.synthesis_kwargs.channel_max = args.D_kwargs.channel_max = 512
args.G_kwargs.mapping_kwargs.num_layers = spec.map
args.G_kwargs.synthesis_kwargs.num_fp16_res = args.D_kwargs.num_fp16_res = 4 # enable mixed-precision training
args.G_kwargs.synthesis_kwargs.conv_clamp = args.D_kwargs.conv_clamp = 256 # clamp activations to avoid float16 overflow
args.D_kwargs.epilogue_kwargs.mbstd_group_size = spec.mbstd

In addition, when loading the pre-trained model, the function copy_params_and_buffers would ignore the unexpected parameters in the pre-trained model without informing of such inconsistency.

def copy_params_and_buffers(src_module, dst_module, require_all=False):
assert isinstance(src_module, torch.nn.Module)
assert isinstance(dst_module, torch.nn.Module)
src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}
for name, tensor in named_params_and_buffers(dst_module):
assert (name in src_tensors) or (not require_all)
if name in src_tensors:
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

@JackywithaWhiteDog Hi. Where did this weight come from? I only get 256*256 pretrained weight in transfer-lerarning folder. The website is different from the one in this issue.

Hi @githuboflk, sorry that I didn't notice your question. I also used the pre-trained weight provided in README as you mentioned.

However, I think the checkpoint in this issue is available at NVIDIA NGC Catalog.