Official implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral).
This code follows the implementation architecture of roytseng-tw/mask-rcnn.pytorch.
Tested under python3.
- python packages
- pytorch==0.4.1
- torchvision==0.2.2
- pyyaml==3.12
- cython
- matplotlib
- numpy
- scipy
- opencv
- packaging
- ipdb
- pycocotools — for COCO dataset, also available from pip.
- tensorboardX — for logging the losses in Tensorboard
- An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
Assume the project is located at $ROOT.
Compile the NMS code:
cd $ROOT/lib
sh make.sh
Create a data folder under the repo,
cd $ROOT
mkdir data
-
COCO: Download the coco images and annotations from coco website.
Our data: Download the our dataset annotations and detection/keypoint proposals from Google Drive and BaiduYun.
Pose estimatiotn We use the repo pytorch-cpn to train our pose estimator. We have released our keypoint predictions of vcoco dataset on our data.
And make sure to put the files as the following structure:
data ├───coco │ ├─images │ │ ├─train2014 │ │ ├─val2014 │ │ │ ├─vcoco │ ├─annotations │ ├─annotations_with_keypoints │ ├─vcoco │ ├───cache │ ├─addPredPose │ ├───pretrained_model ├─e2e_faster_rcnn_R-50-FPN_1x_step119999.pth ├─vcoco_best_model_on_test.pth
cd $ROOT
sh script/train_vcoco.sh
cd $ROOT
sh script/test_vcoco.sh
Our pretrained model vcoco_best_model_on_test.pth has 52.05 AP on vcoco test set.