Multi-scale Feature Polymerizer Aided Coalescing-attention for Object Placement

Official PyTorch Implementation for CA-GAN.

Pretrained Model

We provide models for TERSE [arXiv], PlaceNet [arXiv], GracoNet [arXiv] and our CA-GAN:

	method	FID	ACC	LPIPS	url of model & logs
0	TERSE	46.88	68.8%	0	baidu disk (code: zkk8)
1	PlaceNet	37.01	69.2%	0.161	baidu disk (code: rap8)
2	GracoNet	28.10	82.9%	0.207	baidu disk (code: cayr)
3	CA-GAN	23.21	86.7%	0.270	baidu disk (code: 90yf)

Usage

Install Python 3.6 and PyTorch 1.9.1 (require CUDA >= 10.2):

conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=10.2 -c pytorch

Data preparation

Download and extract OPA dataset from the official link: google drive. We expect the directory structure to be the following:

<PATH_TO_OPA>
  background/       # background images
  foreground/       # foreground images with masks
  composite/        # composite images with masks
  train_set.csv     # train annotation
  test_set.csv      # test annotation

Then, make some preprocessing:

python tool/preprocess.py --data_root <PATH_TO_OPA>

You will see some new files and directories:

<PATH_TO_OPA>
  com_pic_testpos299/          # test set positive composite images (resized to 299)
  train_data.csv               # transformed train annotation
  train_data_pos.csv           # train annotation for positive samples
  test_data.csv                # transformed test annotation
  test_data_pos.csv            # test annotation for positive samples
  test_data_pos_unique.csv     # test annotation for positive samples with different fg/bg pairs

Training

To train CA-GAN on a single 3090 gpu with batch size 32 for 15 epochs, run:

python main.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME>

If you want to reproduce the baseline models, just replace main.py with main_terse.py / main_placenet.py / main_graconet.py for training.

To see the change of losses dynamically, use TensorBoard:

tensorboard --logdir result/<YOUR_EXPERIMENT_NAME>/tblog --port <YOUR_SPECIFIED_PORT>

Inference

To predict composite images from a trained CA-GAN model, run:

python infer.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME> --epoch <EPOCH_TO_EVALUATE> --eval_type eval
python infer.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME> --epoch <EPOCH_TO_EVALUATE> --eval_type evaluni --repeat 10

If you want to infer the baseline models, just replace infer.py with infer_terse.py / infer_placenet.py / infer_graconet.py.

You could also directly make use of our provided models. For example, if you want to infer our best CA-GAN model, please 1) download CA-GAN.zip given above, 2) place it under result and uncompress it:

mv path/to/your/downloaded/CA-GAN.zip result/CA-GAN.zip
cd result
unzip CA-GAN.zip
cd ..

and 3) run:

python infer.py --data_root <PATH_TO_OPA> --expid CA-GAN --epoch 15 --eval_type eval
python infer.py --data_root <PATH_TO_OPA> --expid CA-GAN --epoch 15 --eval_type evaluni --repeat 10

The procedure of inferring our provided baseline models are similar. Remember to use --epoch 11 for TERSE, GracoNet and --epoch 9 for PlaceNet.

Evaluation

To evaluate FID score, run:

sh script/eval_fid.sh <YOUR_EXPERIMENT_NAME> <EPOCH_TO_EVALUATE> <PATH_TO_OPA/com_pic_testpos299>

To evaluate LPIPS score, run:

sh script/eval_lpips.sh <YOUR_EXPERIMENT_NAME> <EPOCH_TO_EVALUATE>

To evaluate the Accuracy score, please follow GracoNet.

Acknowledgements

Some of the evaluation codes in this repo are borrowed and modified from OPA, FID-Pytorch, GracoNet and Perceptual Similarity. Thanks them for their great work.

CodeGoat24/CA-GAN