/VNect

Real-time 3D human pose estimation, implemented by tensorflow

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

VNect

This is a unofficial tensorflow implementation of VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera.

For the caffe model required in the repository: please contact the author of the paper.

Environments

  • Python 3
    • tensorflow v2 (2.0.0+)
    • pycaffe
    • matplotlib 3.0.0 or 3.0.2 (v3.0.2 shuts down occasionally for unknown reason)

Setup

Fedora 29

Install python dependencies:

pip3 install -r requirements.txt --user

Install caffe dependencies

sudo dnf install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel glog-devel gflags-devel lmdb-devel atlas-devel python-lxml boost-python3-devel

Setup Caffe

git clone https://github.com/BVLC/caffe.git
cd caffe

Configure Makefile.config (Include python3 and fix path)

Build Caffe

sudo make all
sudo make runtest
sudo make pycaffe
sudo make distribute
sudo cp .build_release/lib/ /usr/lib64
sudo cp -a distribute/python/caffe/ /usr/lib/python3.7/site-packages/

Usage

Preparation

  1. Drop the caffe model into models/caffe_model.
  2. Run init_weights.py to generate tensorflow model.

Application

  1. (Deperated) benchmark.py is a class implementation containing all the elements needed to run the model.

  2. run_estimator.py is a script for running with video stream.

  3. (Recommended) run_estimator_ps.py is a multiprocessing version script. When 3d plotting function shuts down in run_estimator.py mentioned above, you can try this one.

  4. run_estimator_robot.py additionally provides ROS network and/or serial connection for communication in robot controlling.

  5. [NOTE] To run the video stream based scripts mentioned above:

    i ) click the left mouse button to confirm a simple static bounding box generated by HOG method;

    ii) trigger any keyboard input to exit while the network running.

  6. run_pic.py is a script for running with one single picture: the outputs are 4×21 heatmaps and 2D results.

Notes

  1. I don't know why in some cases the 3d plotting function (from matplotlib) shuts down in the script. It may result from the variety of programming environments. Anyone capable to fix this and pull a request would be gratefully appreciated.
  2. The input image in this implementation is in BGR color format (cv2.imread()) and the pixel value is regulated into a range of [-0.4, 0.6).
  3. The joint-parent map (detailed information in materials/joint_index.xlsx):

  1. I drew a diagram to show the joint positions (don't laugh):

  1. Every input image is assumed to contain 21 joints to be found, which means it is easy to fit wrong results when a joint is actually not in the input.
  2. In some cases the estimation results are not so good as the results shown in the paper author's promotional video.
  3. UPDATE: the running speed is now faster thanks to some coordinate extraction optimization!
  4. The training script train.py is not complete yet (I failed to reconstruct the model:( So do not use it. Maybe some day in the future I will try to fix it. Also pulling requests are welcomed.

TODO

  1. Implement a better bounding box strategy.
  2. Implement the training script.

About Training Data

For MPI-INF-3DHP dataset, refer to my another repository.

Reference Repositories