Scene Graph Generation

Object Detections	Ground truth Scene Graph	Generated Scene Graph

In this visualization, woman sitting on rock is a zero-shot triplet, which means that the combination of woman, sitting on and rock has never been observed during training. However, each of the object and predicate has been observed, but together with other objects and predicate. For example, woman sitting on chair has been observed and is not a zero-shot triplet. Making correct predictions for zero-shots is very challenging, so in our paper we address this problem and improve zero-shot as well as few-shot results.

This code accompanies our paper Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky. "Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation"

To run our experiments we used amazing Rowan Zellers' code for Neural Motifs. Its only problem is the difficult to be run in PyTorch > 0.3, making it hard to use it on some recent GPUs.

So, in this repo, I provide a cleaned-up version that can be run in PyTorch 1.2 or later. The code is based on Mask R-CNN built-in in recent PyTorch. It should be possible to reproduce our GQA results using this code.

This code does not require building or manually downloading anything in advance. Training the Scene Graph Classification (SGCls) model with our loss on Visual Genome is as easy as running this command:

python main.py -data data_path -loss dnorm

The script will automatically download all data and create the following directories (make sure you have at least 30GB of disk space in data_path):

data_path
│   VG
│   │   VG.tar
│   │   VG_100K (this will appear after extracting VG.tar)
│   │   ...
│
└───GQA
│   │   GQA_scenegraphs.tar
│   │   sceneGraphs (this will appear after extracting GQA_scenegraphs.tar)
|   |   ...

To run it on GQA, use:

python main.py -data data_path -loss dnorm -split gqa -lr 0.002

Checkpoints and predictions will be saved locally in ./results. This can be changed by the -save_dir flag. See examples below.

This repository is still in progress, please report any issues.

Requirements

Python > 3.5
PyTorch >= 1.2
Other standard libraries

Should be enough to install these libraries (in addition to PyTorch):

conda install -c anaconda h5py cython dill pandas
conda install -c conda-forge pycocotools tqdm

Results in this repo were obtained on a single GPU 1080/2080 Ti, up to 11GB of GPU memory and 32GB of RAM was required.

TODO

Message Passing with Mask R-CNN
Automatically download all files required to run the code
Obtain SGCls/PredCls results on VG and GQA
Obtain SGGen results on VG and GQA
Add trained checkpoints
Add the code to visualize scene graph generation on GQA using the trained checkpoint

VG Results

Results here are obtained using Mask R-CNN with ResNet-50 as a backbone, while in the paper we used Faster R-CNN with VGG16 as a backbone. We also skip a refinement step in this repo, which is usually required to improve SGGen results. Hence there's some difference in results from the paper. See full details in the paper.

Loss	Detector	SGCls-R@100	SGCls-R_ZS@100	PredCls-R@50	PredCls-R_ZS@50
Baseline, this repo	Mask R-CNN (ResNet-50) pretrained on COCO	47.1	7.8	74.5	23.5
D-norm (ours), this repo^‡	Mask R-CNN (ResNet-50) pretrained on COCO	47.4	9.0	75.4	27.3
D-norm (ours), paper	Faster R-CNN (VGG16) pretrained on VG	48.6	9.1	78.2	28.4

^‡ Can be reproduced by running: python main.py -data data_path -loss dnorm -save_dir VG_sgcls

Or download our VG-SGCls-1 checkpoint