512x512x3 random crops from a 1280x720 image, normalised to [-1, 1]. Dataset from BDD100k.
- Conv 7x7x64, stride 1, InstNorm, ReLU
- Conv 3x3x128, stride 2, InstNorm, ReLU
- Conv 3x3x256, stride 2, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
One ResBlock is: Conv - Norm - Activ - Conv - Norm - Add - Activ.
No norm or activation for last layer, but add bias.
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- ResBlock 3x3x256, InstNorm, ReLU
- Upsample factor 2
- Conv 5x5x128, InstNorm, ReLU (but LayerNorm might be better)
- Upsample factor 2
- Conv 5x5x64, InstNorm, ReLU (but LayerNorm might be better)
- Conv 7x7x3, Tanh
Don't forget bias in the last layer.
- Conv 3x3x64, stride 2, LReLU
- Conv 3x3x128, stride 2, LReLU
- Conv 3x3x256, stride 2, LReLU
- Conv 3x3x512, stride 2, LReLU
- Conv 1x1x1, stride 1
Repeat this discri architecture 3 times (AvgPool 3x3, stride 2 applied in-between).
- Update discriminators weights with generators fixed.
- Update generators weights with discriminator fixed.