A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.
This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).
This project provides the core components for hand motion capture:
- estimating joint locations from a monocular RGB image (DetNet)
- estimating joint rotations from locations (IKNet)
We focus on:
- ease of use (all you need is a webcam)
- time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
- robustness to occlusion, hand-object interaction, fast motion, changing scale and view point
Some links: [paper] [video] [supp doc] [webpage]
Please check requirements.txt
. All dependencies are available via pip and conda.
- Download MANO model from here and unzip it.
- In
config.py
, setOFFICIAL_MANO_PATH
to the left hand model. - Run
python prepare_mano.py
, you will get the converted MANO model that is compatible with this project atconfig.HAND_MESH_MODEL_PATH
.
- Download models from here.
- Put
detnet.ckpt.*
inmodel/detnet
, andiknet.ckpt.*
inmodel/iknet
. - Check
config.py
, make sure all required files are there.
python app.py
- Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
- Press
ESC
to quit.
Please check wrappers.py
.
If you find the project helpful, please consider citing us:
@inproceedings{zhou2020monocular,
title={Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data},
author={Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={0--0},
year={2020}
}
We also provide an optimization-based IK solver here.
The detection model is trained with following datasets:
The IK model is trained with the poses shipped with MANO.
Please check our paper about the datasets and training for more details.