Pose-aware Multi-level Feature Network for Human Object Interaction Detection

Official implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral).

This code follows the implementation architecture of roytseng-tw/mask-rcnn.pytorch.

Getting Started

Requirements

Tested under python3.

python packages
- pytorch==0.4.1
- torchvision==0.2.2
- pyyaml==3.12
- cython
- matplotlib
- numpy
- scipy
- opencv
- packaging
- ipdb
- pycocotools — for COCO dataset, also available from pip.
- tensorboardX — for logging the losses in Tensorboard
An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.

Assume the project is located at $ROOT.

Compilation

Compile the NMS code:

cd $ROOT/lib 
sh make.sh

Data and Pretrained Model Preparation

Create a data folder under the repo,

cd $ROOT
mkdir data

COCO: Download the coco images and annotations from coco website.

Our data: Download the our dataset annotations and detection/keypoint proposals from Google Drive and BaiduYun.

Pose estimatiotn We use the repo pytorch-cpn to train our pose estimator. We have released our keypoint predictions of vcoco dataset on our data.

And make sure to put the files as the following structure:

data
├───coco
│   ├─images
│   │  ├─train2014
│   │  ├─val2014 
│   │
│   ├─vcoco
│      ├─annotations
│      ├─annotations_with_keypoints
│      ├─vcoco
│
├───cache
│   ├─addPredPose
│
├───pretrained_model
    ├─e2e_faster_rcnn_R-50-FPN_1x_step119999.pth
    ├─vcoco_best_model_on_test.pth

Training

cd $ROOT
sh script/train_vcoco.sh

Test

cd $ROOT
sh script/test_vcoco.sh

Our pretrained model vcoco_best_model_on_test.pth has 52.05 AP on vcoco test set.