Tensorize, Factorize and Regularize: Robust Visual Relationship Learning

Seong Jae Hwang, Sathya N. Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh, "Tensorize, Factorize and Regularize: Robust Visual Relationship Learning", Computer Vision and Pattern Recognition (CVPR), 2018.

http://pages.cs.wisc.edu/~sjh/

Dataset preparation

In total, there are files：

The image database file imdb_1024.h5 is generated through files (1-3).

Scene graph database: VG-SGG.h5
Scene graph database metadata: VG-SGG-dicts.json
RoI proposals: proposals.h5
RoI distribution: bbox_distribution.npy
Faster-RCNN model

Case 1: Using preprocessed data (coming soon)

Model link, extract all files. In ./checkpoints, there are two files for baseline models./checkpoints/Xu_2 (by Xu et al.), ./checkpoints/CKP_Vrd (by Lu et al.).
Save Faster-RCNN model to data/pretrained.
Full imdb dataset, image metadata, VG Scene graph, ROI database and its metadata. Check that all files are under data/vg directory and contain following 5 files:
```
imdb_1024.h5
bbox_distribution.npy
dproposals.h5
VG-SGG-dicts.json
VG-SGG.h5
```
Download the VisualGenome image_metadata and its scece_graph, extract the files and place all the jason files under ./data_tools/VG: Check the following 3 files are under data_tools/VG directory:
```
images_data.jason
objects.jason
relationships.jason
```

Case 2: Training from the scratch

You need the following 5 files:

Image database: imdb_1024.h5
Scene graph database: VG-SGG.h5
Scene graph database metadata: VG-SGG-dicts.json
RoI proposals: proposals.h5
RoI distribution: bbox_distribution.npy

(i). Download dataset images. part1 part2

(ii). Save Faster-RCNN model to data/pretrained.

(iii). Place all the json files under data_tools/VG/. Place the images under data_tools/VG/images

(iii). Create image database file imdb_1024.h5 by executing ./create_imdb.sh in this directory. This script creates a hdf5 databse of images imdb_1024.h5. The longer dimension of an image is resized to 1024 pixels and the shorter side is scaled accordingly. You may also create a image database of smaller dimension by editing the size argument of the script. You may skip to (vii) if you chose to downloaded (2-4).

(iv). Create an ROI database and its metadata by executing ./create_roidb.sh in this directory. The scripts creates a scene graph database file VG-SGG.h5 and its metadata VG-SGG-dicts.json. By default, the script reads the dimensions of the images from the imdb file created in (iii). If your imdb file is of different size than 512 and 1024, you must add the size to the img_long_sizes list variable in the vg_to_roidb.py script.

(v). Use the script provided by py-faster-rcnn to generate (4)proposal.h5.

(vi). Change line 93 of tools/train_net.py to True to generate (5) bbox_distribution.npy.

(vii). Finally, place (1-5) in data/vg.

(viii). Check that all files are under data/vg directory and contain following 5 files:

imdb_1024.h5
bbox_distribution.npy
dproposals.h5
VG-SGG-dicts.json
VG-SGG.h5

Installing dependencies

required dependencies:

Python 2.7
TensorFlow r0.12
h5py
numpy 1.11.0
matplotlib
scipy 0.12.0
pyyaml
easydict
cython
Pillow 2.3.0
graphviz (optional, if you wish to visualize the graph structure)
CUDA 8.0

Create python 2.7 environment:

conda create -n tfr python=2.7
source activate tfr

Installing dependenciy packages:

pip install -r requirement.txt

Note: Make sure that your tensorflow version is r.012 GPU enabled version.

(helpful instruction here for installing tensorflow r0.12 on ubuntu 14.04/16.04 and associated software supports).

Compiling ROI pooling layer library

After you have installed all the dependencies, run the following command to compile nms and bbox libraries:

cd lib
make

Follow this this instruction to see if you can use the pre-compiled roi-pooling custom op or have to compile the op by yourself.

Training (For Case 2 in Dataset preparation)

1.Run

./experiments/scripts/train.sh dual_graph_vrd_final 2 CHECKPOINT_DIRECTORY GPU_ID SIGMA

The program saves a checkpoint to ./checkpoints/<_CHECKPOINT_DIRECTORY_>/ every 50000 iterations. Training a full model on a desktop with Intel i7 CPU, 64GB memory, and a TitanX graphics card takes around 20 hours. You may use tensorboard to visualize the training process. By default, the tf log directory is set to checkpoints/<_CHECKPOINT_DIRECTORY_>/tf_logs/.

Evaluation

./experiments/scripts/test.sh <gpu_id> <checkpoint_dir> <checkpoint_file prefix> <model_options> <number_of_inference_for_dual_graph_vrd_fianl> <number_images> <mode>

Where <model_options> are:

 dual_graph_vrd_final by Xu et al (where our implementation is based on).

 vrd by Lu et al.

Three evaluation are:

sg_cls:  predict the predicated object and relationship (predicate) given the ground truth bounding boxes
sg_det (all): predicting object classification, relationship (predicate) prediction, using the proposed bounding box from the regional proposal network as object proposals

e.g.

/experiments/scripts/test.sh 0  CHECKPOINT_DIRECTORY FILE_PREFIX dual_graph_vrd_final 2 100 all

Visualization

Run the same scripts in Evaluation: with one of the following three modes:

 viz_cls: visualize the sg_cls results
 viz_det: visualize the sg_det results
 viz_gt: visualizing the ground truth

Note: If the code is fetched from Xu et al.'s scene graph repository, then

It is imperative to change the following:

Change the line on 26 at lib/roi_data_layer/minibatch.py with the following code (Learn more about why doing this here):

fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int)

Comment out code block from line 76 to 79 at tools/test_net.py, as checkpoints does not contain .ckpt files explicitly and tf.saver only needs correct file prefix to succesfully restore model. Learn more about here.

shwang54/visual-tensor-decomposition