/PUT

Paper 'Transformer based Pluralistic Image Completion with Reduced Information Loss' in TPAMI 2024 and 'Reduce Information Loss in Transformers for Pluralistic Image Inpainting' in CVPR2022

Primary LanguagePythonMIT LicenseMIT

News

[13/08/2024] The controllable image inpainting is avaliable! In addition, a demo ui is provided to support interactive image editing!

click here for more news
  • [10/05/2024] The main branch has been updated to support our TPAMI2024 paper. Currently, only the uncontrollable image inpainting models are provided. The models and the codes for controllable image inpainting will come soon. Please be patient. The origin repository for CVPR2022 paper is avaliable at this url.

  • [21/04/2024] The extension paper "Transformer based Pluralistic Image Completion with Reduced Information Loss" has been accepted to TPAMI 2024. The final PDF is avaliable on arXiv. The improved PUT inpaints images with much better quality with 20x less inference time! The controllable image inpainting is also supported. In addition, more discussions are provided, including the comparison with some popular mask image modeling methods. The code will be updated when I am free. Please be patient.

Introduction

This repo. is the official implementation of our CVPR 2022 paper Reduce Information Loss in Transformers for Pluralistic Image Inpainting and TPAMI 2024 paper Transformer based Pluralistic Image Completion with Reduced Information Loss. It is also a codebase for several tasks, especially friend to image synthesis tasks.

In our interal works, we have re-implented several works with the help of this repo., including ICT, DALL-E, Taming-transformers, Edge-Connect, and so on.

Enjoy the code and find its convience to produce more awesome works!

Overview

Pipeline for uncontrollable image inpainting

Pipeline for controllable image inpainting

Some results

  • Results for resolution 256x256, uncontrollable.

  • Results for resolution 512x512, uncontrollable.

  • Results for resolution 256x256, controllable.

  • Effectiveness of unknown category strategy in controllable image inpainting.

Data preparation

Please ref. prepare_data for details.

Installation

Ref. install_instruction.sh.

Training

Untrollable image inpainting

For each dataset, the training procedure is divideded into two stages: (1) Training of P-VQVAE, and (2) Training of UQ-Transformer. And the training of UQ-Transformer needs the pre-trained P-VQVAE. The training command is very simple like this:

python train_net --name exp_name --config_file path/to/config.yaml --num_node 1 --tensorboard --auto_resume

For example:

python train_net --name cvpr2022_p_vqvae_ffhq --config_file configs/put_cvpr2022/ffhq/p_vqvae_ffhq.yaml --num_node 1 --tensorboard --auto_resume

You can also modify the configures in the given yaml file:

python train_net --name cvpr2022_p_vqvae_ffhq --config_file configs/put_cvpr2022/ffhq/p_vqvae_ffhq.yaml --num_node 1 --tensorboard --auto_resume dataloader.batch_size 2 solver.base_lr 1.0e-4 dataloader.data_root DATASET

NOTE: The training settings are total controlled by the given yaml config file. So making a good yaml config file is quite important! The trained logs, models, and sampled images are all saved to ./OUTPUT/exp_name.

The default training commands are provided in scripts/train_commands_cvpr2022.sh and scripts/train_commands_tpami2024.sh. Note that the batch size, number of nodes, and the number of GPUs should be adjusted according to your machine.

Untrollable image inpainting

Based on the pretrained image P-VQVAE in untrollable image inpainting, three stages are additional required for each dataset:

  • Train semantice P-VQVAE for the encoding of segmentation map;

  • Train structure P-VQVAE for the encoding of sketch map;

  • Train UQ-Transformer with the pretrained image P-VQVAE, semantic encoder and structure encoder.

Note that the training of semantic and structure P-VQVAEs requires additional data. Please refer to prepare_data for more details. Once the data has been prepared, the training commands are also very simple. Please refer to scripts/train_commands_tpami2024_conditional.sh for example.

Inference

We provide several inference functions in ./scripts/inference.py. First of all, you need to train a model or download the pretrained model from OneDrive, or BaiduYunpan (code: 6po2) and put them into ./OUTPUT/.

  1. For image reconstruction:
python scripts/inference.py --name OUTPUT/pvqvae_exp_name/checkpoint/last.pth --func inference_reconstruction --gpu 0 --batch_size 8
  1. For image inpainting with provided/trained transformer model:
python scripts/inference.py --func inference_inpainting \
--name  OUTPUT/transformer_exp_name/checkpoint/last.pth \
--input_res 256,256 \
--num_token_per_iter 1 \                                                # if given like '1,2,5', the script will loop for each of them
--num_token_for_sampling 50 \                                           # if given like '50,100', the script will loop for each of them
--image_dir path/to/images \
--mask_dir path/to/masks \
--save_masked_image \                                                   # whether to save the masked images 
--save_dir path/to/save \
--num_sample 1 \                                                        # the number of inpainted results for each image-mask pair
--gpu 0                                                                 # GPU ID to use. If not given, DDP is performed   

The results will be saved to ./RESULTS/transformer_exp_name/path/to/save. Please refer scripts/inference_commands_cvpr2022.sh and scripts/inference_commands_tpami2024.sh for more details. Some image-mask pairs are provided for each dataset in ./data, you can try to inpaint some images with the provided commands in scripts/inference_commands_cvpr2022.sh and scripts/inference_commands_tpami2024.sh (the models need to be downloaded and putted into ./OUTPUT).

Evaluation

After some results have be generated, the metrics can be obtained by:

sh scripts/metrics/cal_metrics.sh path/to/gt path/to/result

The diversity can be evaluted by:

python scripts/metrics/cal_lpips.py  --path1 path/to/results_dir  --device cuda:0

Interactive Controllable Inpainting

Interactive controllable inpainting is supported with scripts/image_completion_with_ui_conditional.py. Before that, you need to download several pretrained models:

  • DexiNed, which is used to get the sketch from provided image. It can be downloaded from BaiduYunpan (code: 6po2). Put the downloaded model weights to ./OUTPUT/DexiNed.

  • Mask2Former, which is used to get the semantic segmentation map from the provided image. It can be downloaded from BaiduYunpan (code: 6po2). Put the downloaded model weights to ./OUTPUT/Mask2Former.

  • The pretrained transformers tpami2024_vit_base_ffhq_seg_sketch_dual_encoder_res256, tpami2024_vit_base_naturalscene_seg_sketch_dual_encoder_res256 and tpami2024_vit_base_imagenet_seg_sketch_dual_encoder_res256 for controllable inpainting. They also can be downloaded from BaiduYunpan (code: 6po2). Put the downloaded model weights to ./OUTPUT.

Then You can simply run the following commands:

python scripts/image_completion_with_ui_conditional.py --name tpami2024_ffhq_256 --num_samples 8  --num_token_per_iter 10 --topk 200 --batch_size 4 --save_dir RESULT/inpainting_with_ui/ffhq_conditional --im_path data/ffhq_256_sample/gt --mask_path data/ffhq_256_sample/mr0.5_0.6 --ui

Citation

If you find our paper/code are helpful, please consider citing:

# TPAMI paper
@article{liu2024transformer,
  title={Transformer based Pluralistic Image Completion with Reduced Information Loss},
  author={Liu, Qiankun and Jiang, Yuqi and Tan, Zhentao and Chen, Dongdong and Fu, Ying and Chu, Qi and Hua, Gang and Yu, Nenghai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  volume={46},
  number={10},
  pages={6652-6668},
  doi={10.1109/TPAMI.2024.3384406},
  publisher={IEEE}
}

# CVPR paper
@article{liu2022reduce,
  title={Reduce Information Loss in Transformers for Pluralistic Image Inpainting},
  author={Liu, Qiankun and Tan, Zhentao and Chen, Dongdong and Chu, Qi and Dai, Xiyang and Chen, Yinpeng and Liu, Mengchen and Yuan, Lu and Yu, Nenghai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022)},
  year={2022}
}