Joint Probability Distribution Regression for Image Cropping（ACPD-Net）

This repository contains a pytorch implementation of the paper "Joint Probability Distribution Regression for Image Cropping"(Subject to ICIP2023)

In this paper, we present an Aesthetic and Composition Joing Probability Distribution Network~(ACPD-Net) to explicitly investigate the collaboration of image aesthetics and image composition for the cropping task in an end-to-end manner.

Motivation

Pipeline

Requirements

System: Linux(e.g. Ubuntu/CentOS/Arch), macOS, or Windows Subsystem of Linux (WSL)
Python version >=3.6
Pytorch == 1.10.2 Cuda == 11.2
TensorboardX
Opencv == 4.5.5
mmdet == 2.24.1

Install

Clone repo

git clone https://github.com/dafeigediaozhatian/ACPD-Net
cd ACPD-Net

Install dependencies(pytorch, scikit-learn, opencv-python, pandas. Recommend to use Anaconda.)

# Create a new conda environment
conda create -n ACPD-Net python==3.8
conda activate ACPD-Net

# Install other packages
conda env create -f requirements.yml

Pretrain model

AVA dataset and CADB dataset model
- Download the aesthetic pretrain model and put it in ./dataset/aesthetic_model.(download_link, passward：u9un)
- Download the composition pretrain model and put it in ./dataset/composition_model. (download_link, passward：p5uq)
- Download the detection pretrain model and put it in ./checkpoints/ (download_link ,passward：o3bs) The directory structure should be like:

|--checkpoints
   |--faster_rcnn_r50_caffe_c4_mstrain_1x_coco_20220316_150527-db276fed.pth
|--dataset
   |--aesthetic_model
      |--aesthetic-resnet50-model-epoch-10.pkl
   |--composition_model
      |--composition-resnet50-model-epoch-10.pkl

Training and test

Traning scripts for two datasets can be found in train_cropping.py. The dataroot argument should be modified to path_to_<dataset_name>. Run the follwing command for training:

# Training on FCDB and FLMS
python train_cropping.py

# Test the result
python test_cropping.py

The Ablation study on the aesthetics threshold θ

The ablation study on the parameter θ in Eq. 2, and we select θ=0.05 as the optical choice, where the average intersection-over-union (IoU) and the average boundary displacement error (BDE) as metrics.

The model size and inference speed of SOTA methods

The model parameters of our method are not the least, and its inference speed is not the fastest, but our model can meet real-time requirements (FPS > 60). All tests were conducted on Nvidia GTX 3090."

Visual results

Qualitative comparison and user study of different methods. Compared with other methods, our method can obtain better visually cropping results close to GT. Last row with the user study results show that most users favor our method.

Citation

@inproceedings{shi2023aesthetic,
  title={Joint Probability Distribution Regression for Image Cropping},
  author={Tengfei Shi, Chenglizhao Chen, Yuanbo He, Wenfeng Song, Aiming Hao},
  conference={ICIP2023},
  year={2023}
}

ZXMMD/ACPD-Net