Wide-Range Image Blending - PyTorch Implementation


The objective of proposed model of wide-range image blending is learning to generate new content for the intermediate region which connects two different input photos, thus leading to a semantically coherent and spatially smooth panoramic image. Our full model is shown below, where in the following we will sequentially describe our model designs, including the image context encoder-decoder, the bidirectional content transfer module, and the contextual attention mechanism on skip connection, as well as the training details.

This repository contains the Pytorch implementation of the following paper:

Bridging the Visual Gap: Wide-Range Image Blending
Chia-Ni Lu, Ya-Chu Chang, Wei-Chen Chiu
https://arxiv.org/abs/2103.15149

Abstract: In this paper we propose a new problem scenario in image processing, wide-range image blending, which aims to smoothly merge two different input photos into a panorama by generating novel image content for the intermediate region between them. Although such problem is closely related to the topics of image inpainting, image outpainting, and image blending, none of the approaches from these topics is able to easily address it. We introduce an effective deep-learning model to realize wide-range image blending, where a novel Bidirectional Content Transfer module is proposed to perform the conditional prediction for the feature representation of the intermediate region via recurrent neural networks. In addition to ensuring the spatial and semantic consistency during the blending, we also adopt the contextual attention mechanism as well as the adversarial learning scheme in our proposed method for improving the visual quality of the resultant panorama. We experimentally demonstrate that our proposed method is not only able to produce visually appealing results for wide-range image blending, but also able to provide superior performance with respect to several baselines built upon the state-of-theart image inpainting and outpainting approaches.

Architecture


(a) Full Model: Our full model takes Ileft and Iright as input, and compresses them into compact representations ˜fleft and ˜fright individually via the encoder. Afterwards, our novel Bidirectional Content Transfer (BCT) module is used to predict ˜fmid from ˜fleft and ˜fright. Lastly, based on the feature ˜f, which is obtained by concatenating {˜fleft, ˜fmid, ˜fright} along the horizontal direction, the decoder generates our final result ˜I. Noting that there is a contextual attention mechanism on the skip connection between the encoder and decoder, which helps to enrich the texture and details of our blending result.
(b) LSDTM Encoder: The architecture of the LSTM encoder EBCT in our BCT module, which encodes the information of ˜fleft or ˜fright to generate c˜left or c˜right.
(c) LSTM Decoder: The architecture of the conditional LSTM decoder DBCT in our BCT module, which takes the condition c˜right (respectively c˜left) as well as the input ˜fleft (respectively ˜fright) to predict the feature map -->fmid (respectively <--fmid). The prediction of ˜fmid related to the intermediate region, which blends between ˜fleft and ˜fright, is then obtained via concatenating -->fmid and <--fmid along the channel dimension followed by passing through a 1 × 1 convolutional layer.

Two-Stage Training

  1. Self-Reconstruction Stage: We adopt the objective of self-reconstruction, where the two input photos {Ileft, Iright} and the intermediate region are obtained from the same image. This is achieved by first splitting a wide image vertically and equally into three parts, then taking the leftmost one-third and the rightmost one-third as Ileft and Iright respectively, while the middle one-third can be treated as the ground truth Imid for the generated intermediate region ˜Imid.

    • We adopt the scenery dataset proposed by Very Long Natural Scenery Image Prediction by Outpainting for conducting our experiments, in which we split the dataset to 5040 training images and 1000 testing images.
    • Download the dataset with our split of train and test set from here and put them under data/.

    • Run the training code for self-reconstruction stage
    python train_SR.py
    
    • If you want to run train the model with your own dataset
    python train_SR.py --train_data_dir YOUR_DATA_PATH
    
  2. Fine-Tuning Stage: We keep using the objective of self-reconstruction as the previous training stage, but additionally consider another objective which is based on the training samples of having Ileft and Iright obtained from different images (i.e. different scenes). As there is no ground truth of ˜Imid now for such training samples, this additional training objective is then based on the adversarial learning.

    • After finishing the training of self-reconstruction stage, move the latest model weights from checkpoints/SR_Stage/ to weights/ (or use pre-train weights from self-reconstruction stage), and run the training code for fine-tuning stage (second stage)
    python train_FT.py --load_pretrain True
    

Testing

Download our pre-trained model weights from here and put them under weights/.

Test the sample data provided in this repo:

python test.py

Or download our paired test data from here and put them under data/.
Then run the testing code:

python test.py --test_data_dir_1 ./data/scenery6000_paired/test/input1/
               --test_data_dir_2 ./data/scenery6000_paired/test/input2/

Run your own data:

python test.py --test_data_dir_1 YOUR_DATA_PATH_1
               --test_data_dir_2 YOUR_DATA_PATH_2
               --save_dir YOUR_SAVE_PATH

If your test data isn't paired already, add --rand_pair True to randomly pair the data.