Image_Harmonization_Datasets: iHarmony4

Try this online demo for image harmonization and have fun!

Image Harmonization is to harmonize a composite image by adjusting its foreground appearances consistent with the background region. A real composite image is generated by a foreground region of one image combined with the background of another image. Though it's easy to create real composite images, the harmonized outputs are too time-consuming and skill-demanding to generate. So there is no high-quality publicly available dataset for image harmonization.

We release the first large-scale image harmonization dataset iHarmony4. It contains 4 sub-datasets: HCOCO,HAdobe5k, HFlickr, and Hday2night, each of which contains synthesized composite images, foreground masks of composite images and corresponding real images. The iHarmony4 dataset is provided in Baidu Cloud (access code: kqz3) and OneDrive.

	HCOCO	HAdobe5k	HFlickr	Hday2night	iHarmony4
Training set	38545	19437	7449	311	65742
Test set	4283	2160	828	133	7404

You can augment the small-scale dataset using our SycoNet, by synthesizing high-quality composite images for real images.

We also construct ccHarmony dataset with color checker, which can more faithfully reflect the illumination variation.

1. HCOCO

HCOCO, containing 42k synthesized composite images, is generated based on Microsoft COCO dataset. The foreground region is corresponding object segmentation mask provided from COCO. Within the foreground region, the appearance of COCO image is edited using various color transfer methods. The HCOCO sub-dataset and training/testing split are provided in Baidu Cloud (access code: ab5e) and OneDrive.

2. HAdobe5k

HAdobe5k is generated based on MIT-Adobe FiveK dataset. Provided with 6 editions of the same image, we manually segment the foreground region and exchange foregrounds between 2 versions. The HAdobe5k sub-dataset and training/testing split are provided in Baidu Cloud and OneDrive.

3. HFlickr

We collected 4833 images from Flickr. After manually segmenting the foreground region, we use the same method as HCOCO to generate HFlickr sub-dataset. The HFlickr sub-dataset and training/testing split are provided in Baidu Cloud and OneDrive,

4. Hday2night

Hday2night is generated based on day2night dataset. We manually segment the foreground region, which is cropped and overlaid on another image captured on a different time. The Hday2night sub-dataset and training/testing split are provided in Baidu Cloud and OneDrive.

Color Transfer Methods

To generate synthesized composite images, color transfer methods are adopted to transfer color information from reference images to real images. Considering that color transfer methods can be categorized into four groups based on parametric/non-parametric and correlated/decorrelated color space, we select one representative method from each group. Thanks to Wei Xu's efforts for releasing the code of color transfer method 1, 2 and 3 in their survey paper, we could implement color transfer methods specialized for foreground based on their implementation. And the source code of IDT regrain color transfer is downloaded from the author's GitHub

1. global color transfer

--Parametric method in decorrelated color space. Implementation of paper "Color transfer between images" [pdf].

2. global color transfer in RGB color space

--Parametric method in correlated color space. Implementation of paper "Color transfer in correlated color space" [pdf].

3. cumulative histogram mapping

--Non-parametric method in decorrelated color space. Implementation of paper ""Histogram-based prefiltering for luminance and chrominance compensation of multiview video" [pdf].

4. IDT regrain color transfer

--Non-parametric method in correlated color space. Implementation of paper "Automated colour grading using colour distribution transfer" [pdf].

Our DoveNet

Here we provide PyTorch implementation and the trained model of our DoveNet.

Prerequisites

Linux
Python 3
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

Clone this repo:

git clone https://github.com/bcmi/Image_Harmonization_Datasets.git
cd Image_Harmonization_Datasets

Download the iHarmony4 dataset.
Install PyTorch 1.2 and other dependencies (e.g., torchvision, visdom and dominate).
- For Conda users, you can create a new Conda environment using conda env create -f environment.yaml.

DoveNet train/test

To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097.
Train a model:

#!./scripts/train_dovenet.sh
python train.py  --dataset_root <path_to_iHarmony4_dataset> --name experiment_name  --model dovenet --dataset_mode iharmony4 --is_train 1  --gan_mode wgangp  --norm instance --no_flip --preprocess none --netG s2ad

Remember to specify dataset_root and name in the corresponding place.

To see more intermediate results, you can check out visdom or ./checkpoints/experiment_name/web/index.html.

Test the model:

#!./scripts/test_dovenet.sh
python test.py --dataset_root <path_to_iHarmony4_dataset> --name experiment_name --model dovenet --dataset_mode iharmony4 --netG s2ad --is_train 0  --norm instance --no_flip --preprocess none --num_test 7404

Remember to specify dataset_root and name in the corresponding places.

When testing, it prints the results of evaluation metrics MSE and PSNR. It also saves the harmonized outputs in ./results/experiment_name/latest_test/images/

Apply a pre-trained DoveNet model

Our pre-trained model is available on Baidu Cloud (access code: 8q8a) and OneDrive. Download and save it at ./checkpoints/experiment_name_pretrain/latest_net_G.pth.

As both instance normalization and batch normalization perform well for our task, the model we provided here is the one using batch normalization.

To test its performance on iHarmony4 dataset, using:

python test.py --dataset_root <path_to_iHarmony4_dataset> --name experiment_name_pretrain --model dovenet --dataset_mode iharmony4 --netG s2ad --is_train 0  --norm batch --no_flip --preprocess none --num_test 7404

Note to specify dataset_root and name in the corresponding place.

Baselines

Here, we provide the code of baselines used in our paper "DoveNet: Deep Image Harmonization via Domain Verification", which is accepted by CVPR2020. Refer to Bibtex for more details.

1. Lalonde and Efros

J.-F. Lalonde et al. provides their implementation of paper "Using color compatibility for assessing image realism" (ICCV2017) in their GitHub.

And we have arranged the code to a "click-and-run" way. demo.m is available in /lalonde/colorStatistics/mycode/demo/. Don't forget to specify the path of the code and results in your computer in getPathName.m, and run setPath.m before run demo.mto get everything ready.

2. Xue et al.

This is Xue's implementation of their paper in 2012 ACM Transactions on Graphics "Understanding and improving the realism of image composites".

demo.m is available in /xue/demo/.

Notice to add the path of all dependent files using addpath(genpath('../dependency')).

3. Zhu et al.

Jun-Yan Zhu released the code of their paper "Learning a discriminative model for the perception of realism in composite images" (ICCV2015) in their GitHub.

Notice that it requires matcaffe interface. We make some changes corresponds to our dataset including how to preprocess data and how to save the harmonized results. Don't forget to specify DATA_DIR,MODEL_DIR and RST_DIR before running demo.m.

The pre-trained models of Zhu's work can also be found in Baidu Cloud and OneDrive. Remember to put it under MODEL_DIR.

4. DIH

Tsai released their pre-trained caffe model of their paper "Deep Image Harmonization" (CVPR2017) in their GitHub. This is a Tensorflow implementation based on the released caffe network.

Besides, we discard one inner-most convolutional layer and one inner-most deconvolutional layer to make it suitable for input of 256*256 size. In DIH, they proposed to use segmentation branch to help propogate semantics to harmonization branch and it contributes considerable improvments. So here we inplement this two versions, DIH without segmentation branch and DIH with segmentation branch, corresponding to DIH(w/o semantics) and DIH in their paper.

without segmentation branch

We discard the scene parsing branch and preserve the remaining encoder-decoder structure and skip links. And this is the version used as one of the baselines in our paper.

To train DIH(w/o semantics) , under the folder wo_semantics/, run:

python train.py --data_dir <Your Path to Dataset> --init_lr 0.0001 --batch_size 32

Don't forget to specify the directory of Image Harmonization Dataset after data_dir.

Our trained model can be found in Baidu Cloud and OneDrive. To test and re-produce the results, remember to put the model under /dih/wo_semantics/model/ and run:

python test.py --batch_size 1

with segmentation branch

The structure is implemented the same as the Caffe network. In DIH, to pre-train the joint network, they constructed a synthesized composite dataset based on ADE20K, which provides images segmentations. While we use HCOCO instead, leveraging existing segmentaitons of COCO images from COCO-Stuff dataset GitHub. In our experiment, we leverage the object segmentations to pretrain the network. After downloading the dataset, preprocess the PNG segmentation to filter out stuff labels. Remember to rename the corresponding segmentation with the same name as real images and put them under <Your Path to Dataset>/HCOCO/object_segmentations/. Then, we freeze the segmentation branch and finetune harmonization branch using the whole dataset.

To pre-train this model, under the folder with_semantics/, run:

python train_seg.py --data_dir <Your Path to Dataset> --init_lr 0.0001 --batch_size 32

After that, freeze the segmentation branch and finetune harmonization branch. Run:

python finetune.py --data_dir <Your Path to Dataset> --init_lr 0.0001 --batch_size 32 --pretrain False

Specify the directory of Image Harmonization Dataset after data_dir.

Our trained model can be found in Baidu Cloud and OneDrive. To test and re-produce the results, remember to put the model under /dih/with_semantics/model/ and run:

python test_seg.py --batch_size 1

5. S²AM

Cun and Pan released the code and model of their paper "Improving the Harmony of the Composite Image by Spatial-Separated Attention Module" (TIP2020) in GitHub. They provide the model trained on their SCOCO and S-Adobe5k dataset and the models trained on each sub-dataset of iHarmony4 individually. To facilitate the fair comparison, we adopt the same training strategy to train S²AM model on the merged training set of four sub-datasets using the released code from their GitHub. The trained model could be found in Baidu Cloud (access code: 92tj) and OneDrive.

6. SSH

Yifan Jiang provide the inference code and pretrained weight of their paper "SSH: A Self-Supervised Framework for Image Harmonization" (ICCV2021) in their GitHub. Notice that SSH has different input format (separate foreground and background image) and training data (image crops processed with LUTs), which does not strictly match our setting. Here we directly testing the released model from the official SSH GitHub, in which the composite image is regarded as the input in the testing pipeline of SSH, while the ground-truth real image is the reference. We make some changes corresponds to our dataset in demo.ipynb and convert it to test.py, which could be found in Baidu Cloud (access code: 2tdg) and OneDrive. The test images are resized to 256*256 for for fair comparison.

To test and re-produce the results, remember to put the pretrained weight downloaded from the official SSH GitHub under ./, modify the data paths in test.py, and run:

python test.py

Experiments

When conducting experiments, we merge training sets of four sub-datasets as a whole training set to train the model, and evaluate it on the test set of each sub-dataset and the whole test set. Here we show the results of recent baselines on our iHarmony4 dataset based on MSE and PSNR metrics. In addition, we also provide the fMSE (foreground MSE) score on the whole test set to facilitate future study. The paper/code/model of image harmonization related methods are summarized in Awesome-image-harmonization. The following leaderboard is based on fMSE metric. This leaderboard has stopped updating. For the up-to-date leaderboard, please refer to here.

Sub-dataset	Extra info	All			HCOCO		HAdobe5k		HFlickr		Hday2night
Evaluation metric	Extra info	fMSE	MSE	PSNR	MSE	PSNR	MSE	PSNR	MSE	PSNR	MSE	PSNR
input composite	-	1387.30	172.47	31.63	69.37	33.94	345.54	28.16	264.35	28.32	109.65	34.01
CDTNet [CVPR2022]	-	252.05	23.75	38.23	16.25	39.15	20.62	38.24	68.61	33.55	36.72	37.95
iSSAM [WACV2021]^@	-	264.96	24.44	38.19	16.48	39.16	21.88	38.08	69.67	33.56	40.59	37.72
D-HT [ICCV2021]	-	320.78	30.30	37.55	16.89	38.76	38.53	36.88	74.51	33.13	53.01	37.10
iDIH [WACV2021]	-^#	341.77	31.71	37.14	19.51	38.40	33.81	36.39	86.44	32.60	49.94	37.01
iDIH [WACV2021]	S	252.00	22.00	38.31	14.01	39.64	21.36	37.35	60.41	34.03	50.61	37.68
Guo et al. [CVPR2021]	-	400.29	38.71	35.90	24.92	37.16	43.02	35.20	105.13	31.34	55.53	35.96
BargainNet [ICME2021]	-	405.23	37.82	35.88	24.84	37.03	39.94	35.34	97.32	31.34	50.98	35.67
Hao et al. [BMVC2020]⁺	-	437.90	38.46	35.91	23.44	37.33	39.22	34.80	112.39	31.29	49.73	36.96
RainNet [CVPR2021]	-	469.60	40.29	36.12	-	37.08	-	36.22	-	31.64	-	34.83
S²AM [TIP2020]^*	-	481.79	48.00	35.29	33.07	36.09	48.22	35.34	124.53	31.00	48.78	35.60
DoveNet [CVPR2020]	-	549.96	52.36	34.75	36.72	35.83	52.32	34.34	133.14	30.21	54.05	35.18
DIH [CVPR2017]	-	773.18	76.77	33.41	51.85	34.69	92.65	32.28	163.38	29.55	82.34	34.62
DIH [CVPR2017]	S	769.79	76.63	33.50	49.63	34.80	95.41	32.29	168.62	29.58	68.81	35.51
SSH [ICCV2021]^%	-	1140.66	89.23	32.77	73.20	34.03	115.03	31.73	266.56	28.68	98.84	34.56
Xue et al. [TOG2012]	-	1411.40	155.87	31.40	77.04	33.32	274.15	28.79	249.54	28.32	190.51	31.24
Lalonde and Efros [ICCV2017]	-	1433.21	150.53	30.16	110.10	31.14	158.90	29.66	329.87	26.43	199.93	29.80
Zhu et al. [ICCV2015]	-	1580.17	204.77	30.72	79.82	33.04	414.31	27.26	315.42	27.52	136.71	32.32

S in Extro info indicates using auxiliary semantic information in image harmonization.

*: Results of S²AM here are trained from scratch using the code from the official S²AM GitHub. In the GitHub, they provide results trained on each sub-dataset individually, which we do not include here for fair comparison.

+: Results of Hao et al. here are tested using the released model from the official GitHub since the results of their released model are not consistent with the reported results in their paper.

#: Results of iDIH backbone without auxiliary semantic information are tested using the released model from the official GitHub since they do not report the detailed results on each sub-dataset in their paper.

@: Note that iDIH and iSSAM listed in the table are two different backbones mentioned in the same paper [WACV2021]. Results of iSSAM backbone without auxiliary semantic information are also tested using the released model from the official GitHub since they do not report the fMSE metric in their paper.

%: Note that SSH has different input format (separate foreground and background image) and training data (image crops processed with LUTs), which does not strictly match our setting. Results here are tested using the released model from the official SSH GitHub. The composite image is regarded as the input in the testing pipeline of SSH, while the ground-truth real image is the reference.

Other results without any specifications are directly copied from our DoveNet or other published papers.

Here we also show some example results of different baselines on our dataset. More examples can be found in our main paper.

Besides, to evaluate the effectiveness of different methods in real scenarios, we also conduct user study on 99 real composite images, of which 48 images from Xue and 51 images from Tsai. Below we present several results of different baselines on real composite images. The 99 real composite images could be found in Baidu Cloud and OneDrive. And to visualize the comparison, we have put the results of different methods on all 99 real composite images in Supplementary.

Other Resources

Bibtex

When using images from our dataset, please cite our paper using the following BibTeX [pdf] [supp] [arxiv]:

@inproceedings{DoveNet2020,
title={DoveNet: Deep Image Harmonization via Domain Verification},
author={Wenyan Cong and Jianfu Zhang and Li Niu and Liu Liu and Zhixin Ling and Weiyuan Li and Liqing Zhang},
booktitle={CVPR},
year={2020}}

bcmi/Image-Harmonization-Dataset-iHarmony4