/domain-adapt

Primary LanguagePythonMIT LicenseMIT

Unsupervised Image-to-Image Translation Network

Inputs

512x512x3 random crops from a 1280x720 image, normalised to [-1, 1]. Dataset from BDD100k.

Encoder

  • Conv 7x7x64, stride 1, InstNorm, ReLU
  • Conv 3x3x128, stride 2, InstNorm, ReLU
  • Conv 3x3x256, stride 2, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU

One ResBlock is: Conv - Norm - Activ - Conv - Norm - Add - Activ.

No norm or activation for last layer, but add bias.

Decoder

  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • ResBlock 3x3x256, InstNorm, ReLU
  • Upsample factor 2
  • Conv 5x5x128, InstNorm, ReLU (but LayerNorm might be better)
  • Upsample factor 2
  • Conv 5x5x64, InstNorm, ReLU (but LayerNorm might be better)
  • Conv 7x7x3, Tanh

Don't forget bias in the last layer.

Discriminator

  • Conv 3x3x64, stride 2, LReLU
  • Conv 3x3x128, stride 2, LReLU
  • Conv 3x3x256, stride 2, LReLU
  • Conv 3x3x512, stride 2, LReLU
  • Conv 1x1x1, stride 1

Repeat this discri architecture 3 times (AvgPool 3x3, stride 2 applied in-between).

Training

  • Update discriminators weights with generators fixed.
  • Update generators weights with discriminator fixed.