V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Introduction
This is our project repository for the paper, V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map (CVPR 2018).
We, Team SNU CVLAB, (Gyeongsik Moon, Juyong Chang, and Kyoung Mu Lee of Computer Vision Lab, Seoul National University) are winners of HANDS2017 Challenge on frame-based 3D hand pose estimation.
Please refer to our paper for details.
If you find our work useful in your research or publication, please cite our work:
[1] Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map." CVPR 2018. [arXiv]
@InProceedings{Moon_2018_CVPR_V2V-PoseNet,
author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
title = {V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}
In this repository, we provide
- Our model architecture description (V2V-PoseNet)
- HANDS2017 frame-based 3D hand pose estimation Challenge Results
- Comparison with the previous state-of-the-art methods
- Training code
- Datasets we used (ICVL, NYU, MSRA, ITOP)
- Trained models and estimated results
- 3D hand and human pose estimation examples
Model Architecture
HANDS2017 frame-based 3D hand pose estimation Challenge Results
Comparison with the previous state-of-the-art methods
About our code
Dependencies
Our code is tested under Ubuntu 14.04 and 16.04 environment with Titan X GPUs (12GB VRAM).
Code
Clone this repository into any place you want. You may follow the example below.
makeReposit = [/the/directory/as/you/wish]
mkdir -p $makeReposit/; cd $makeReposit/
git clone https://github.com/mks0601/V2V-PoseNet_RELEASE.git
src
folder contains lua script files for data loader, trainer, tester and other utilities.data
folder contains data converter which converts image files to the binary files.
This code is only for 3D hand pose estimation. You can change the code slightly to train and test on the ITOP dataset. If you need the code for that, please contact me.
Dataset
We trained and tested our model on the four 3D hand pose estimation and one 3D human pose estimation datasets.
- ICVL Hand Poseture Dataset [link] [paper]
- NYU Hand Pose Dataset [link] [paper]
- MSRA Hand Pose Dataset [link] [paper]
- HANDS2017 Challenge Dataset [link] [paper]
- ITOP Human Pose Dataset [link] [paper]
Training
-
To train our model, please run the following command:
th rum_me.lua
- There are some optional configurations you can adjust in the config.lua.
Results
Here we provide the precomputed centers, and estimated 3D coordinates.
The precomputed centers are obtained by training the network from DeepPrior++ . Each line represents 3D world coordinate of each frame.
The 3D coordinates estimated on the ICVL, NYU and MSRA datasets are pixel coordinates and the 3D coordinates estimated on the ITOP datasets are world coordinates.
If you want a pretrained model, please contact me.
- ICVL Hand Poseture Dataset [center_trainset] [center_testset] [estimation]
- NYU Hand Pose Dataset [center_trainset] [center_testset] [estimation]
- MSRA Hand Pose Dataset [center] [estimation]
- ITOP Human Pose Dataset (front-view) [center_trainset] [center_testset] [estimation]
- ITOP Human Pose Dataset (side-view) [center_trainset] [center_testset] [estimation]
We used awesome-hand-pose-estimation to evaluate the accuracy of the V2V-PoseNet on the ICVL, NYU and MSRA dataset.