/DIRV

Code for "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021)

Primary LanguagePython

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Official code implementation for the paper "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021) paper.

The code is developed based on the architecture of zylo117/Yet-Another-EfficientDet-Pytorch. We also follow some data pre-processing and model evaluation methods in BigRedT/no_frills_hoi_det and vt-vl-lab/iCAN. We sincerely thank the authors for the excellent work.

Checklist

  • Training and Test for V-COCO dataset
  • Training and Test for HICO-DET dataset
  • Demonstration on images
  • Demonstration on videos
  • More efficient voting strategy for inference using GPU

Prerequisites

The code was tested with python 3.6, pytorch 1.5.1, torchvision 0.6.1, CUDA 10.2, and Ubuntu 18.04.

Installation

  1. Clone this repository:

    git clone https://github.com/MVIG-SJTU/DIRV.git
    
  2. Install pytorch and torchvision:

    pip install torch==1.5.1 torchvision==0.6.1
    
  3. Install other necessary packages:

    pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors
    

Data Preparation

V-COCO Dataset:

Download V-COCO dataset following the official instructions.

You can find the files new_prior_mask.pkl here. Each element inside it refers to the prior probability that a verb (e.g. eat) is associated with an object category (e.g. apple). You should also download the combined training and valdataion sets annotations instances_trainval2014.json here, and put it in datasets/vcoco/coco/annotations.

HICO-DET Dataset:

Download HICO-DET dataset from the official website.

We transform the annotations of HICO-DET dataset to JSON format following BigRedT/no_frills_hoi_det. You can directly download the processed annotations from here.

We count the training sample number of each category in hico_processed/hico-det_verb_count.json. It serves as a weight when calculating loss.

Dataset Structure:

Make sure to put the files in the following structure:

|-- datasets
|   |-- vcoco
|	|	|-- data
|	|	|	|-- splits
|	|	|	|-- vcoco
|	|	|
|	|	|-- coco
|	| 	|	|-- images
|	|	|	|-- annotations
|	|	|-- new_prior_mask.pkl   
|   |-- hico_20160224_det
|	|	|-- images
|	|	|-- hico_processed

Demonstration

Demonstration on Images

CUDA_VISIBLE_DEVICES=0 python demo.py --image_path /path/to/a/single/image

Demonstration on Videos

Coming soon.

Pre-trained Weights

You can download the pre-trained weights for V-COCO dataset (vcoco_best.pth) and HICO-DET dataset (hico-det_best.pth) here.

Training

Download the pre-trained weight of our backbone (efficientdet-d3_vcoco.pth and efficientdet-d3_hico-det.pth) here, and save it in weights/ directory.

Training on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py -p vcoco --batch_size 32 --load_weights weights/efficientdet-d3_vcoco.pth

Training on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -p hico-det --batch_size 48 --load_weights weights/efficientdet-d3_hico-det.pth

You may also adjust the saving directory and GPU number in projects/vcoco.yaml and projects/hico-det.yaml or create your own projects in projects/.

Test

Test on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0 python test_vcoco.py -w $path to the checkpoint$

Test on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0 python test_hico-det.py -w $path to the checkpoint$

Then please follow the same procedures in vt-vl-lab/iCAN to evaluate the result on HICO-DET dataset.

Citation

If you found our paper or code useful for your research, please cite the following paper:

@inproceedings{fang2020dirv,
      title={DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection}, 
      author={Fang, Hao-Shu and Xie, Yichen and Shao, Dian and Lu, Cewu},
      year={2021},
      booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}
}