/CDTNet-High-Resolution-Image-Harmonization

[CVPR 2022] We unify pixel-to-pixel transformation and color-to-color transformation in a coherent framework for high-resolution image harmonization. We also release 100 high-resolution real composite images for evaluation.

CDTNet-High-Resolution-Image-Harmonization

This is the official repository for the following paper:

High-Resolution Image Harmonization via Collaborative Dual Transformations [arXiv]

Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang
Accepted by CVPR2022.

We propose a high-resolution image harmonization network named CDTNet to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end framework. As shown in the image below, our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both.


Unfortunately, code and model are not allowed to be made public due to the collaboration with Hisense, but our network could be easily reimplemented based on the public code sources. We have provided two datasets used in our paper and the harmonization results of our method for comparison. More resources (code, model, data, results) for academic usage are available upon request.

By the way, an unofficial implementation of our paper can be found here.

Datasets

1. HAdobe5k

HAdobe5k is one of the four synthesized sub-datasets in iHarmony4 dataset, which is the benchmark dataset for image harmonization. Specifically, HAdobe5k is generated based on MIT-Adobe FiveK dataset and contains 21597 image triplets (composite image, real image, mask) as shown below, where 19437 triplets are used for training and 2160 triplets are used for test. Official training/test split could be found in Baidu Cloud (Alternative_address).

MIT-Adobe FiveK provides with 6 retouched versions for each image, so we manually segment the foreground region and exchange foregrounds between 2 versions to generate composite images. High-resolution images in HAdobe5k sub-dataset are with random resolution from 1296 to 6048, which could be downloaded from Baidu Cloud (Alternative_address).

2. 100 High-Resolution Real Composite Images

Considering that the composite images in HAdobe5k are synthetic composite images, we additionally provide 100 high-resolution real composite images for qualitative comparison in real scenarios with image pairs (composite image, mask), which are generated based on Open Image Dataset V6 dataset and Flickr.

Open Image Dataset V6 contains ~9M images with 28M instance segmentation annotations of 350 categories, where enormous images are collected from Flickr and with high resolution. So the foreground images are collected from the whole Open Image Dataset V6, where the provided instance segmentations are used to crop the foregrounds. The background images are collected from both Open Image Dataset V6 and Flickr, considering the resolutions and semantics. Then cropped foregrounds and background images are combined using PhotoShop, leading to obviously inharmonious composite images.

100 high-resolution real composite images are with random resolution from 1024 to 6016, which could be downloaded from Baidu Cloud (access code: vnrp) (Alternative_address).

Results

1. High-resolution (1024×1024 and 2048×2048) results on HAdobe5k test set

We test our CDTNet on 1024×1024 and 2048×2048 images from HAdobe5k dataset and report the harmonization performance based on MSE, PSNR, fMSE, and SSIM. Here we also release all harmonized results on both resolutions. Due to JPEG compression, the performance tested on our provided results would be not surprisingly worse than our reported performance.

Image Size Model MSE PSNR fMSE SSIM Test Images Download
1024×1024 CDTNet-256 21.24 38.77 152.13 0.9868 Baidu Cloud (access code: i8l1)
2048×2048 CDTNet-512 23.35 38.45 159.13 0.9853 Baidu Cloud (access code: nu2h)

We show several results on 1024×1024 resolution below, where yellow boxes zoom in the particular regions for a better observation.

2. High-resolution (1024×1024) results on 100 real composite images

We test our CDTNet on 100 high-resolution real composite images as mentioned above, and provide the results on Baidu Cloud (access code: lr7k).

3. Low-resolution (256×256) results on iHarmony4 test set

We also test our CDTNet on 256×256 images from iHarmony4 dataset and compare the results with iS2AM. Note that the performance of iS2AM is tested using its publicly released model on [GitHub].We also provide all harmonized results on Baidu Cloud (access code: ob5n).

Sub-dataset HCOCO HAdobe5k HFlickr Hday2night All
Evaluation metric MSE PSNR MSE PSNR MSE PSNR MSE PSNR MSE PSNR
iS2AM 16.48 39.16 22.60 37.24 69.67 33.56 40.59 37.72 24.65 37.95
CDTNet-256 16.25 39.15 20.62 38.24 68.61 33.55 36.72 37.95 23.75 38.23

4. Low-resolution (256×256) results on 99 real composite images

We also test our CDTNet on another 99 real composite images used in previous works, and provide the results on Baidu Cloud (access code: i6e8).

Other Resources

Acknowledgement

Our code is heavily borrowed from iSSAM and 3D LUT.