Detecting Unseen Visual Relations Using Analogies

Created by Julia Peyre at INRIA, Paris.

Introduction

This is the code for the paper :

Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic, Detecting Unseen Visual Relations Using Analogies, ICCV19.

The webpage for this project is available here, with a link to the paper.

This code is available for research purpose (MIT License).

Installation
Data
Train
Test
Evaluation
Erratum

Installation

This code was tested on Python 2.7, Pytorch 0.4.0, CUDA 8.0 Install the dependencies with:

pip install -r requirements.txt

Data

We release data and pre-trained models for HICO-DET. To set-up the directories, please follow these steps:

Download the pre-computed data

wget https://www.rocq.inria.fr/cluster-willow/jpeyre/analogy/data.tar.gz
tar zxvf data.tar.gz

This should be unzip into ./data folder
This contains the object detections, visual features as well as database objects to run our code on HICO-DET.

Download HICO images
Load HICO images and place them into directory images in ./data/hico/images :
Link to COCO API
Download COCO API into new directory ./data/coco and run make
Download pre-computed models and detections

wget https://www.rocq.inria.fr/cluster-willow/jpeyre/analogy/runs.tar.gz
tar zxvf runs.tar.gz

This should be unzip into ./runs folder

Train

You can re-train our model by running:

python train.py --config_path $CONFIG_PATH

We provide config files in ./configs directory.
Feel free to edit the config options to train variants of our model.

Test

You can extract the detections by running:

python eval_hico.py --config_path $CONFIG_PATH

To extract the detections using our analogy model, you can run:

python eval_hico_analogy.py --config_path $CONFIG_PATH

Evaluation

We use the official evaluation code to evaluate performance on HICO-DET

Erratum

Please note that the numerical results in the paper were obtained using a slightly different version for analogy transformation $\Gamma$ than what is described in Eq.(6) of the paper. This variant computes analogy transformation as:

$\bm{w}^{vp}_{t'} = \bm{w}^{vp}_{t} + \Gamma \begin{bmatrix} \bm{w}^{s}_{s'} - \bm{w}^{vp}_{s} \\ \bm{w}^{p}_{p'} - \bm{w}^{vp}_{p} \\ \bm{w}^{o}_{o'} - \bm{w}^{vp}_{o}, \end{bmatrix}$

where $\bm{w}^{s}_{s'}, \bm{w}^{p}_{p'}, \bm{w}^{o}_{o'}$ are the embeddings of target subject, predicate and object in unigram spaces, and $\bm{w}^{vp}_{s}, \bm{w}^{vp}_{p}, \bm{w}^{vp}_{o}$ are the embeddings of source subject, predicate and object in visual phrase space.

You can choose between the 2 versions through the option --analogy_type. The default option described above is called 'hybrid'. To run the variant described in the paper, please activate the option --analogy_type='vp' in the config file such as in './configs/hico_trainvalzeroshot_analogy_vp.yaml'.

The variant 'vp' results in ~1% performance drop compared to the results in the paper (Table 2. s+o+vp+transfer (deep): 28.6 -> 27.5). The corresponding model is released in runs/ directory. We are still investigating why the 'hybrid' version performs better than the 'vp' one.

We would like to thank Kenneth Wong from Institute of Computing Technology, Chinese Academy of Sciences, for his careful code review and pointing out this inconsistency.

We apologize for this inconvenience. Also, please do not hesitate to contact the first author for further clarifications.

Cite

If you find this code useful in your research, please, consider citing our paper:

@InProceedings{Peyre19, author = "Peyre, Julia and Laptev, Ivan and Schmid, Cordelia and Sivic, Josef", title = "Detecting Unseen Visual Relations Using Analogies", booktitle = "ICCV", year = "2019" }

Questions

Any question please contact the first author julia.peyre@inria.fr

jpeyre/analogy