/Hand-CNN

code for Contextual Attention for Hand Detection in the Wild, International Conference on Computer Vision (ICCV), 2019.

Primary LanguagePythonOtherNOASSERTION

Contextual Attention for Hand Detection in the Wild

This repository contains the code and datasets related to the following paper:

Contextual Attention for Hand Detection in the Wild, International Conference on Computer Vision (ICCV), 2019.

Please also see the website related to the project.

Contents

This repository contains the following:

  • A Keras implementation of the proposed attention method.
  • Code to train and evaluate the Hand-CNN. This code is based on Matterport's Keras implementation of Mask-RCNN.
  • Two large-scale annotated hand datasets, TV-Hand and COCO-Hand, containing hands in unconstrained images for training and evaluation.
  • Models trained on TV-Hand and COCO-Hand datasets.

Installation

Install the required dependecies as in requirements.txt.

Data annotation format

Create annotations in a .txt file in the follwing format:

/path/to/image/, x_min, x_max, y_min, y_max, x1, y1, x2, y2, x3, y3, y4, hand, where

  • (x1, y1), (x2, y2), (x3, y3) and (x4, y4) are the coordinates of the polygon (rotated) bounding box for a hand in the anti-clockwise order.
  • The wrist is given by the line joining (x1, y1) and (x2, y2).
  • x_min = min(x1, x2, x3, x4), x_max = max(x1, x2, x3, x4), y_min = min(y1, y2, y3, y4), y_max = max(y1, y2, y3, y4).

Folder structure

The code is organized in the following structure:

datasets/
  annotations/
    train_annotations.txt
    val_annotations.txt
    test_annotations.txt

mrcnn/
  config.py
  model.py
  parallel_model.py
  utils.py
  visualize.py
  contextual_attention.py

samples/
  hand/
    results/
    hand.py
    load_weights.py

model/
  mask_rcnn_coco.h5
  trained_weights.h5

requirements.txt
setup.cfg
setup.py
  • ./datasets/annotations/: create the annotations for the training, validation and testing data in the format as described above.
  • ./mrcnn/: this directory contains the main Mask-RCNN code. The file ./mrcnn/model.py is modified to add a new orientation branch to the Mask-RCNN to predict orientation of the hand.
  • ./mrcnn/contextual_attention.py: this file implements the proposed attention method.
  • ./samples/hand/hand.py: code for training and evaluating Hand-CNN.
  • ./model/mask_rcnn_coco.h5: pretrained Mask-RCNN weights on COCO dataset.
  • ./model/trained_weights.h5: model trained on TV-Hand and COCO-Hand data.

Models

Download the models from models and place them in ./model/.

Training

Use the following command to train Hand-CNN:

python -W ignore samples/hand/hand.py --weight coco --command train

  • --weight: If coco, starts training using the pretrained COCO weights. Additionally, /path/to/weights/ can also be speciifed to start training from other weights.

The training set to be used can be specified in train function of the file ./samples/hand/hand.py.

Detection

Use the following to run detection on images and visualize them:

python -W ignore detect.py --image_dir /path/to/folder_containing_images/

The outputs will be stored ./outputs/.

Evaluation

Use the following command to to evaluate a trained Hand-CNN:

python -W ignore samples/hand/hand.py --weight path/to/weights --command test --testset oxford

  • --weight: Path to the trained weights to be used for evaluation.

Datasets

As a part of this project we release two datasets, TV-Hand and COCO-Hand. The TV-Hand dataset contains hand annotations for 9.5K image frames extracted from the ActionThread dataset. The COCO-Hand dataset contains annotations for 25K images of the Microsoft's COCO dataset.

Some images and annotations from the TV-Hand data

Some images and annotations from the COCO-Hand data

Citation

If you use the code or datasets in this repository, please cite the following paper:

@article{Hand-CNN,
  title={Contextual Attention for Hand Detection in the Wild},
  author={Supreeth Narasimhaswamy and Zhengwei Wei and Yang Wang and Justin Zhang and Minh Hoai},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2019},
  url={https://arxiv.org/pdf/1904.04882.pdf} 
}