PyTorch implementation of paper: Qifeng Chen and Vladlen Koltun. Photographic Image Synthesis with Cascaded Refinement Networks.
Goal is to synthesise photographic images from semantic maps. Process can be seen as inverse segmentation.
Cascaded Refinement Network is composed of refinement modules. Each module operates at different spatial scale. It takes as input concatenation of downsampled semantic map where each class gets different depth channel and output from previous module.
Single refinement module Mi does basic operations like convolution, layer normalization and leaky ReLU.
The Cascaded Refinement Network generates not one, buk k diversed images. After passing each one of them through VGG19 network, perceptual loss with respect to reference texture image from dataset is computed. Final loss is defined as minimum from perceptual losses computed for k diversed images.
-
Download VGG19 model.
-
Download semantic maps and photographic images from Cityscapes Dataset, and put folders gtFine and leftImg8bit in cityscape directory.
python semantic2labels.py
python train.py
python eval.py