Our implementation uses external libraries such as NumPy and PyTorch. You can resolve the dependencies with the following command.
pip install numpy
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI&egg=pycocotools
pip install git+https://github.com/cocodataset/panopticapi.git#egg=panopticapi
pip install scipy cython submitit
Note that this command may dump errors during installing pycocotools, but the errors can be ignored.
HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz
) to the data
directory.
Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.
First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json
. Next, download the prior file prior.pickle
from here. Place the files and make directories as follows.
neubla_hoi_att
|─ data
│ └─ v-coco
| |─ data
| | |─ instances_vcoco_all_2014.json
| | :
| |─ prior.pickle
| |─ images
| | |─ train2014
| | | |─ COCO_train2014_000000000009.jpg
| | | :
| | └─ val2014
| | |─ COCO_val2014_000000000042.jpg
| | :
| |─ annotations
| | |─ corre_vcoco.npy
| | |─ trainval_vcoco.json
| | |─ test_vcoco.json
: : :
└─ hico_20160224_det
| |─ images
| | |─ train2015
| | | |─ HICO_train2015_00000001.jpg
| | | :
| | └─ test2015
| | |─ HICO_test2015_00000001.jpg
| | :
| |─ annotations
| | |─ corre_hico.npy
| | |─ trainval_hico.json
| | |─ test_hico.json
: : :
└─ vaw
| |─ images
| | |─ VG_100K
| | | |─ 2.jpg
| | | :
| | └─ VG_100K_2
| | |─ 1.jpg
| | :
| |─ annotations
| | |─ attribute_index.json
| | |─ vaw_coco_train.json
| | |─ vaw_coco_test.json
| | |─ vaw_coco_train_cat_info.json
: : :
For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.
PYTHONPATH=data/v-coco \
python convert_vcoco_annotations.py \
--load_path data/v-coco/data \
--prior_path data/v-coco/prior.pickle \
--save_path data/v-coco/annotations
Note that only Python2 can be used for this conversion because vsrl_utils.py
in the v-coco repository shows a error with Python3.
V-COCO annotations with the HOIA format, corre_vcoco.npy
, test_vcoco.json
, and trainval_vcoco.json
will be generated to annotations
directory.
Our QPIC have to be pre-trained with the COCO object detection dataset. For the HICO-DET training, this pre-training can be omitted by using the parameters of DETR. The parameters can be downloaded from here for the ResNet50 backbone, and here for the ResNet101 backbone. For the V-COCO training, this pre-training has to be carried out because some images of the V-COCO evaluation set are contained in the training set of DETR. You have to pre-train QPIC without those overlapping images by yourself for the V-COCO evaluation.
For HICO-DET, move the downloaded parameters to the params
directory and convert the parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-hico.pth
For V-COCO, convert the pre-trained parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-vcoco.pth \
--dataset vcoco
For VAW, convert the pre-trained parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-vaw.pth \
--use_vaw
For MTL(attribute + hoi detection), convert the pre-trained parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-mtl.pth \
--use_vaw \
--dataset vcoco
CUDA_VISIBLE_DEVICES=1,2 GPUS_PER_NODE=2 ./tool/run_dist_launch.sh 2 configs/mtl_train.sh \
--mtl_data [\'vaw\'] \
--output_dir checkpoints/vaw \
--pretrained params/detr-r50-pre-vaw.pth
CUDA_VISIBLE_DEVICES=1,2 GPUS_PER_NODE=2 ./tool/run_dist_launch.sh 2 configs/mtl_train.sh \
--mtl_data [\'vcoco\',\'hico\',\'vaw\'] \
--output_dir checkpoints/mtl_all \
--pretrained params/detr-r50-pre-mtl.pth
configs/mtl_eval.sh \
--pretrained checkpoints/mtl_all/checkpoint.pth \
--output_dir test_results/ \
--mtl_data [\'vcoco\',\'hico\',\'vaw\']
"test_mAP_all": 0.5455718251429631, "test_mAP_thesis": 0.5663461447990525
"test_mAP": 0.27877264960450454, "test_mAP rare": 0.20416854381834068, "test_mAP non-rare": 0.30105699289128085, "test_mean max recall": 0.6536133057960736
"test_mAP": 0.0524253535493328, "test_mAP rare": 0.029776368059209662, "test_mAP non-rare": 0.07093753803669373, "test_mean max recall": 0.37233911507467743
python vis_demo.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type vcoco \
--mtl_data [\'vcoco\'] \
--mtl \
--video_file video/cycle.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30
python vis_demo.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type hico \
--mtl_data [\'hico\'] \
--mtl \
--video_file video/cycle.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30
python vis_demo.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type [\'hico\',\'vcoco\'] \
--mtl_data [\'hico\',\'vcoco\'] \
--mtl \
--video_file video/cycle.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30
python vis_demo.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type vaw \
--mtl_data [\'vaw\'] \
--mtl \
--video_file video/animal.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30
python vis_demo.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type vaw \
--mtl_data [\'vaw\'] \
--mtl \
--video_file video/animal.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30 \
--color
python vis_demo2.py \
--checkpoint checkpoints/mtl_all/checkpoint.pth \
--inf_type [\'vcoco\',\'vaw\'] \
--mtl_data [\'vcoco\',\'vaw\'] \
--mtl \
--video_file video/cycle.mp4 \
--show_vid \
--top_k 2 \
--threshold 0.4 \
--fps 30 \
--all
Our implementation is based on the official code QPIC
@inproceedings{tamura_cvpr2021,
author = {Tamura, Masato and Ohashi, Hiroki and Yoshinaga, Tomoaki},
title = {{QPIC}: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information},
booktitle={CVPR},
year = {2021},
}