Realtime Multiperson Pose Estimation
C++ code repo for the ECCV 2016 demo, "Realtime Multiperson Pose Estimation", Zhe Cao, Shih-En Wei, Tomas Simon, Yaser Sheikh. Thanks Ginés Hidalgo Martínez for restructuring the code.
The full project repo includes matlab and python version, and training code.
This project is licensed under the terms of the GPL v3 license .
Quick Start
- Required: CUDA & cuDNN installed on your machine.
- If you have installed OpenCV 2.4 in your system, go to step 3. If you are using OpenCV 3, uncomment the line
# OPENCV_VERSION := 3
on the fileMakefile.config.Ubuntu14.example
(for Ubuntu 14) and/orMakefile.config.Ubuntu16.example
(for Ubuntu 15 or 16). In addition, OpenCV 3 does not incorporate theopencv_contrib
module by default. Assuming you have manually installed it and you need to use it, appendopencv_contrib
at the end of the lineLIBRARIES += opencv_core opencv_highgui opencv_imgproc
in theMakefile
file. - Build
caffe
&rtpose.bin
+ download the required caffe models (script tested on Ubuntu 14.04 & 16.04, it uses all the available cores in your machine):**
chmod u+x install_caffe_and_cpm.sh
./install_caffe_and_cpm.sh
Running on a video:
./build/examples/rtpose/rtpose.bin --video video_file.mp4
Running on your webcam:
./build/examples/rtpose/rtpose.bin
Important options:
--help
<--- It displays all the available options.
--video input.mp4
<--- Input video. If omitted, will use webcam.
--camera #
<--- Choose webcam number (default: 0).
--image_dir path_to_images/
<--- Run on all jpg, png, or bmp images in path_to_images/
. If omitted, will use webcam.
--write_frames path/
<--- Render images with this prefix: path/frame%06d.jpg
--write_json path/
<--- Output JSON file with joints with this prefix: path/frame%06d.json
--no_frame_drops
<--- Don't drop frames. Important for making offline results.
--no_display
<--- Don't open a display window. Useful if there's no X server.
--num_gpu 4
<--- Parallelize over this number of GPUs. Default is 1.
--num_scales 3 --scale_gap 0.15
<--- Use 3 scales, 1, (1-0.15), (1-0.15*2). Default is one scale=1.
(HD)
--net_resolution 656x368 --resolution 1280x720
(These are the default values.)
(VGA)
--net_resolution 496x368 --resolution 640x480
--logtostderr
<--- Log messages to standard error.
Example:
Run on a video vid.mp4
, render image frames as output/frame%06d.jpg
and output JSON files as output/frame%06d.json
, using 3 scales (1.00, 0.85, and 0.70), parallelized over 2 GPUs:
./build/examples/rtpose/rtpose.bin --video vid.mp4 --num_gpu 2 --no_frame_drops --write_frames output/ --write_json output/ --num_scales 3 --scale_gap 0.15
Output format:
Each JSON file has a bodies
array of objects, where each object has an array joints
containing the joint locations and detection confidence formatted as x1,y1,c1,x2,y2,c2,...
, where c
is the confidence in [0,1].
{
"version":0.1,
"bodies":[
{"joints":[1114.15,160.396,0.846207,...]},
{"joints":[...]},
]
}
where the joint order of the COCO parts is: (see src/rtpose/modelDescriptorFactory.cpp )
part2name {
{0, "Nose"},
{1, "Neck"},
{2, "RShoulder"},
{3, "RElbow"},
{4, "RWrist"},
{5, "LShoulder"},
{6, "LElbow"},
{7, "LWrist"},
{8, "RHip"},
{9, "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "REye"},
{15, "LEye"},
{16, "REar"},
{17, "LEar"},
{18, "Bkg"},
}
Custom Caffe:
We modified and added several Caffe files in include/caffe
and src/caffe
. In case you want to use your own Caffe distribution, these are the files we added and modified:
- Added folders in
include/caffe
andsrc/caffe
:include/caffe/cpm
andsrc/caffe/cpm
. - Modified files in
include/caffe
(search for// CPM extra code:
to find the modified code):layers/base_data_layer.h
. - Modified files in
src/caffe
(search for// CPM extra code:
to find the modified code):proto/caffe.proto
,layers/base_data_layer.cpp
,layers/base_data_layer.cu
andutil/blocking_queue.cpp
. - Replaced files:
README.md
. - Added files:
install_caffe_and_cpm.sh
,Makefile.config.Ubuntu14.example
(extracted fromMakefile.config.example
) andMakefile.config.Ubuntu16.example
(extracted fromMakefile.config.example
). - Other added folders:
model/
,examples/rtpose
,/include/rtpose
and/src/rtpose
. - Other modified files:
Makefile
. - Optional - deleted Caffe files and folders (only to save space):
Makefile.config.example
,data/
,examples/
(do not deleteexamples/rtpose
) andmodels/
.
Custom Caffe layers:
We created a few Caffe layers (located in include/caffe/cpm/layers
and src/caffe/cpm/layers
):
- CocoDetectDataLayer: Only used to load data for training (training code temporary unavailable).
- CPMBottomUpDataLayer: Only used to load data for training (training code temporary unavailable).
- CPMDataLayer: Only used to load data for training (training code temporary unavailable).
- ImResizeLayer: Only used for testing (backward pass not implemented). This layer performs 2-D resize over the 4-D data. I.e., given a 4-D input of size (
num
xchannels
xheight_input
xwidth_input
), the layer returns a 4-D output of size (num
xchannels
xheight_output
xwidth_output
). It is independently applied to each dimension ofnum
andchannels
. Its parameters are:factor
: Scaling factor with respect to the input width and height.factor
is the alternative to the pair of variables [target_spatial_width
,target_spatial_height
]. Iffactor != 0
, the latter are ignored.scale_gap
andstart_scale
: These parameters are related and used for doing scale search in testing mode. Ifstart_scale = 1
(default), the CNN input patch size is the net resolution (set with--net_resolution
).scale_gap
is used to calculate the scale difference between scales. This parameters are related with the flag--num_scales
. For instance, using--start_scale 1 --num_scales 3 --scale_gap 0.1
means using 3 scales: 1, 1-0.1, 1-2*0.1, hence the different patch sizes correspond to the net resolution multiplied by these scales values.target_spatial_height
: Alternative tofactor
. It sets the output height. Ignored iffactor != 0
.target_spatial_width
: Alternative tofactor
. It sets the output width. Ignored iffactor != 0
.
- NmsLayer: Only used for testing (backward pass not implemented). This layer performs 3-D Non-Maximum Suppression over the 4-D data. I.e., given a 4-D input of size (
num
xchannels
xheight
xwidth
), it returns a 4-D output of size (num
xnum_parts
xmax_peaks+1
x3
). It is independently applied to each dimension ofnum
. The seconds dimension corresponds to the number of limbs (num_parts
). The third dimension indicates the maximum number of peaks to be analyzed (max_peaks+1
). Finally, the last one corresponds to thex
,y
andscore
values (3
). Its parameters are:max_peaks
: The number of peaks to be considered. The lasttotal_peaks
-max_peaks
peaks are discarded.num_parts
: The number of limbs to detect (e.g. 15 for MPI and 18 for COCO).threshold
: Any input value smaller than this threshold is set to 0.
Citation
Please cite the paper in your publications if it helps your research:
@article{cao2016realtime,
title={Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
author={Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
journal={arXiv preprint arXiv:1611.08050},
year={2016}
}
@inproceedings{wei2016cpm,
author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
booktitle = {CVPR},
title = {Convolutional pose machines},
year = {2016}
}