This is a project we built for the Hand Pose Estimation problem. In this project, we tested the Stacked Hourglass Network model (a fairly well-known model used for Human Pose Estimation). In addition, we switched from the usual bottom-up method to the top-down by adding a hand-detect module. Here is the architecture model we use:
-
python==3.8.16
-
Install PyTorch-cuda==11.7 following official instruction:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
-
Install the necessary dependencies by running:
pip install -r requirements.txt
Please organize your datasets for training and testing following this structure:
Main-folder/
│
├── data/
│ ├── FreiHAND_pub_v2 - This folder contains data for training model
| | ├── ...
| |
│ └── FreiHAND_pub_v2_eval - public test images
| ├── ...
|
└── ...
- Put the downloaded FreiHAND dataset in ./data/
Link: https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2.zip
- Put the downloaded FreiHAND evaluation set in ./data/
Link: https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2_eval.zip
In this project, we focus on training Stacked Hourglass Network. As for the hand detect module, we'd like to use the victordibia's pretrained_model (SSD) without further modification. Train the hourglass network:
python 1.train.py --config-file "configs/train_FreiHAND_dataset.yaml"
The trained model weights (net_hm.pth) will be at Main-folder/. Copy and paste the trained model into ./model/trained_models before evaluate.
Evaluate on FreiHAND dataset:
python 2.evaluate_FreiHAND.py --config-file "configs/eval_FreiHAND_dataset.yaml"
The visualization results will be saved to ./output/
Prepare a camera with and clear angle, good light, and less noisy space. Run the following command line:
python 3.real_time_2D_hand_pose_estimation.py --config-file "configs/eval_webcam.yaml"
Note: Our model only solves the one-handed recognition problem. If there are 2 or more hands, the model will randomly select one hand to predict. To predict multiple hands, please edit the file 3.real_time_2D_hand_pose_estimation.py (because of resource and time limitations, we don't do this part).
To fine-tune the hyperparameters (BATCH_SIZE, NUM_WORKERS, DATA_SIZE, ...), you can edit the .yaml files in the ./configs/ directory.
The repo is developed based on victordibia and enghock1. Thanks for your contribution.