HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

ICCV 2023, official code implementation, arXiv

Abstract

Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods.

Method

An overview of our HiLo framework with HiLo baseline. a) HiLo relation swapping module swaps the multiple relations in the subject-object pair to obtain H-L Data and L-H Data respectively. b) Input data into our HiLo framework with HiLo baseline model, there are two branches, namely H-L decoder and L-H decoder, which learn H-L Data and L-H Data respectively. c) In addition to task losses for PSG, we propose HiLo prediction alignment, which includes subject-object consistency loss and relation consistency loss, so that the parallel branch can be better optimized.

Results

Comparison between our HiLo and other methods on the PSG dataset. Our method shows superior performance compared to all previous methods.

Visualization

Visualization of panoptic segmentations and the top 20 predicted triplets compared with ground truth. The upper left is the original image, the lower left is the ground truth and on the right are the predictions. The highlighted triplets represent the subject-object pairs with multiple relations, where the blue highlights represent the high frequency relations and the red highlights represents the low frequency relations. The visualization results show that our method can predict both high frequency and low frequency relations.

Preparation

Dev environment:

git clone https://github.com/franciszzj/HiLo.git
cd HiLo
conda create --name hilo --file spec-file.txt
conda activate hilo

Pretrained models are directly converted from Mask2Former using this code.

python tools/change_model.py path/to/pretrained/model

Configs

Config path: ./configs/psgmask2former/

R50: psgmask2former_r50_hilo_baseline.py, psgmask2former_r50_hilo.py
Swin Base: psgmask2former_swin_b_hilo_baseline.py, psgmask2former_swin_b_hilo.py
Swin Large: psgmask2former_swin_l_hilo_baseline.py, psgmask2former_swin_l_hilo.py

Hyperparameter:

EVAL_PAN_RELS: For details, refer to issue#30, issue#60, and issue#100.
model.bbox_head.test_forward_output_type: 'high2low', 'low2high', and 'merge'.

Training

Train HiLo baseline:

PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
  tools/train.py path/to/hilo_baseline/config --auto-resume --no-validate --seed 666 --launcher pytorch

Obtaining a new training file through IETrans:

Note: you should also add gt_xxx in the test_pipeline. You can refer to example_config for specifics.

PYTHONPATH='.':$PYTHONPATH \
python tools/data_prepare/ietrans.py path/to/hilo_baseline/config path/to/checkpoint path/to/output

Train HiLo:

PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
  tools/train.py path/to/hilo/config --auto-resume --no-validate --seed 666 --launcher pytorch

Testing and Evaluation

Test and eval HiLo baseline:

PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python tools/test.py path/to/hilo_baseline/config path/to/checkpoint --eval sgdet_PQ

Test and eval HiLo:

PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python tools/test.py path/to/hilo/config path/to/checkpoint --eval sgdet_PQ --cfg-options model.bbox_head.test_forward_output_type='merge'

Processed Data and Trained Models

For the convenience to follow HiLo, we provide the PSG json file processed through IETrans, as well as a trained model and the config file saved from the training process for reference.

Note:

For the R50 model, we used use_shared_query=True. However, after multiple experiments, we found that the results for use_shared_query=True/False are similar. Therefore, we did not provide an R50 model with use_shared_query=False. While for the SwinB/SwinL models, they are use_shared_query=False models.
The results reported in the paper are with EVAL_PAN_RELS=False, for a fairer comparison with methods like PSGTR. However, we have implemented a more efficient post-processing method, where the performance with EVAL_PAN_RELS=True is similar to that with EVAL_PAN_RELS=False.

Backbone	PSG file (IETrans processed)	Converted Mask2Former	HiLo Baseline Model	Config (for HiLo train)	HiLo Model
R50	psg_ietrans.json	mask2former_r50_converted.pth	hilo_baseline_r50.pth	hilo_r50.py	hilo_r50.pth
SwinB	psg_ietrans_swin_b.json	mask2former_swin_b_converted.pth	hilo_baseline_swin_b.pth	hilo_swin_b.py	hilo_swin_b.pth
SwinL	psg_ietrans_swin_l.json	mask2former_swin_l_converted.pth	hilo_baseline_swin_l.pth	hilo_swin_l.py	hilo_swin_l.pth

Acknowledgements

HiLo is developed based on OpenPSG and MMDetection. Thanks for their great works!

Citation

If you find this repository useful, please cite:

@InProceedings{zhou2023hilo,
    author    = {Zhou, Zijian and Shi, Miaojing and Caesar, Holger},
    title     = {HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {21637-21648}
}

DrFV/HiLo