Code for "Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors".
Fixing google-ai-edge/mediapipe#5373
The MediaPipe ROI estimation can be not so great... Here are a few examples:
Worst and Best are the edge cases as seen in the data. The rest were picked manually. We only look at the right hand. If the data is for the left hand, we flip the image. In the following table, for each image we show the hand keypoints in green lines, and two bounding boxes in green - gold ROI and red - predicted ROI. To indicate the orientation of the bounding box, we draw a blue line on the bottom edge of each box.
Worst | Bad | OK | Good | Best |
---|---|---|---|---|
ROI: 0.08% | --- | --- | --- | ROI: 93.7% |
git clone https://github.com/sign-language-processing/mediapipe-hand-crop-fix.git
cd mediapipe-hand-crop-fix
Download and extract the Panoptic Hand Pose Dataset:
cd data
wget http://domedb.perception.cs.cmu.edu/panopticDB/hands/hand_labels.zip
Estimate full body poses using MediaPipe Holistic:
python -m mediapipe_crop_estimate.estimate_poses
Collect the annotations as well as estimated regions of interest:
python -m mediapipe_crop_estimate.collect_hands
Then, train an MLP using the annotations and estimated regions of interest:
python -m mediapipe_crop_estimate.train_mlp
Finally, evaluate:
python -m mediapipe_crop_estimate.evaluate