/cvpr17_vtranse

Primary LanguagePythonOtherNOASSERTION

This implements "Visual Translation Embedding Network for Visual Relation Detection,Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua (CVPR2017)"

Recently there is also a tensorflow adaption provided by yangxuntu, which obtain significant improvement on vg dataset. You can find the code here(https://github.com/yangxuntu/vtranse)

What's inside?

Download links

The files are in google drive. The direct link to the folder is at https://drive.google.com/open?id=1BvtjCnlORMg4l92kNgZ2g1YaHYj9Dy3X

Coming soon

  • end to end training instructions and code
  • demo code

Setup

Object Detector

Ensure data folder looks like this.

zawlin@zlgpu:~/g/cvpr17_vtranse/data$ tree -l -L 4 -d
.
├── demo
├── scripts
├── sg_vrd_2016 -> /media/zawlin/ssd/data/vrd/vrd/sg
│   ├── Annotations
│   │   ├── sg_test_images
│   │   └── sg_train_images
│   ├── Data
│   │   ├── sg_test_images
│   │   └── sg_train_images
│   ├── devkit
│   │   ├── data
│   │   │   └── ilsvrc_det_sample
│   │   └── evaluation
│   └── ImageSets
└── vg1_2_2016 -> /media/zawlin/ssd/data/vrd/vg_1.2/voc_format
    ├── Annotations
    │   ├── test
    │   │   ├── VG_100K
    │   │   └── VG_100K_2
    │   └── train
    │       ├── VG_100K
    │       └── VG_100K_2
    ├── Data
    │   ├── test
    │   │   ├── VG_100K
    │   │   └── VG_100K_2
    │   └── train
    │       ├── VG_100K
    │       └── VG_100K_2
    ├── devkit
    │   ├── data
    │   │   └── ilsvrc_det_sample
    │   └── evaluation
    └── ImageSets

Training And Evaluation Instructions

I am using ubuntu 16.04 with gcc 5.4. If you run into protobuf errors, usually recompiling protobuf from source will eliminate the errors. When I refer to folders, it is with respect to the root github source folder.

The steps below are for vrd dataset. For vg, the steps are similar, you will just need to change the some folder or file paths to point to vg directory or scripts.

  • First clone the repo
    • git clone git@github.com:zawlin/cvpr17_vtranse.git
    • git submodule update --recursive
  • cd into caffe-fast-rcnn folder to build caffe. This step is the same as building py-faster-rcnn. You also need to copy Makefile.config.ubuntu16 to Makefile.config before you run make command.
  • Prepare data by creating symbolic links under data folder as described in the previous section. You will also need to copy the dataset hdf5 files to data folder and the object detector models into the model folder.
  • Make nms module.
    • cd lib;make
    • cython is required for this step. you can install it by pip install cython --user.
  • A working object detector is required at this stage, please refer to py-faster-rcnn training instructions. Please note that the number of anchors needs to be modified to successfuly train the detector.
  • Next is to generate visual features for training relation model.
    • python lib/vrd/save_cache.py
    • After this step sg_vrd_2016_test.hdf5 and sg_vrd_2016_train.hdf5 should be generated under output folder if you are training for vrd dataset.
  • Run the visual relation training.
    • python lib/vrd/run_train_relation.py
  • Generate results in matlab mat format for evaluation.
    • python lib/vrd/run_relation_test.py
    • this will generate a file under output, e.g., sg_vrd_2016_result_all_50000.mat.
  • Copy the generated mat file to relation_evaluation/vr/data/vtranse_results.mat
    • cp output/sg_vrd_2016_result_all_50000.mat relation_evaluation/vr/data/vtranse_results.mat
  • Run the matlab evaluation script at relation_evaluation/vr/eval_vtranse.m

Citation

If you're using this code in a scientific publication please cite:

@inproceedings{Zhang_2017_CVPR,
  author    = {Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua},
  title     = {Visual Translation Embedding Network for Visual Relation Detection},
  booktitle = {CVPR},
  year      = {2017},
}