This project demonstrates a real-time hand gesture recognition system using MediaPipe and OpenCV. The system detects hands in the video feed, recognizes gestures, and draws hand landmarks with different colors for each hand. It recognizes basic gestures like "Thumbs Up" and "Thumbs Down."
- Real-time hand detection and gesture recognition.
- Different color codes for left and right hands.
- Easy-to-extend gesture recognition logic.
import cv2
import mediapipe as mp
cv2
is OpenCV, a library for computer vision tasks.mediapipe
is a library by Google for building machine learning solutions, such as hand tracking.
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_hands
initializes the MediaPipe hands solution.mp_drawing
initializes the drawing utilities to draw landmarks.
def detect_hands(frame):
try:
image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(image_rgb)
if results.multi_hand_landmarks:
for idx, hand_landmarks in enumerate(results.multi_hand_landmarks):
hand_label = results.multi_handedness[idx].classification[0].label
if hand_label == 'Left':
drawing_spec = mp_drawing.DrawingSpec(color=(0, 0, 255), thickness=2, circle_radius=4)
else:
drawing_spec = mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=4)
mp_drawing.draw_landmarks(
frame,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
drawing_spec,
mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=2)
)
gesture = recognize_gesture(hand_landmarks)
cv2.putText(frame, f"{hand_label} Hand: {gesture}", (10, 30 * (idx + 1)),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
except Exception as e:
print(f"Error in processing frame: {e}")
return frame
- Converts the frame from BGR (used by OpenCV) to RGB (used by MediaPipe).
- Processes the RGB image to detect hands.
- If hand landmarks are detected, iterates through each detected hand.
- Determines if the detected hand is left or right.
- Sets different drawing specifications (color and thickness) for left and right hands.
- Draws landmarks on the detected hand using the specified drawing specifications.
- Calls
recognize_gesture
to identify the gesture. - Displays the recognized gesture on the frame.
def recognize_gesture(hand_landmarks):
thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
middle_tip = hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP]
ring_tip = hand_landmarks.landmark[mp_hands.HandLandmark.RING_FINGER_TIP]
pinky_tip = hand_landmarks.landmark[mp_hands.HandLandmark.PINKY_TIP]
if thumb_tip.y < index_tip.y and middle_tip.y < ring_tip.y and ring_tip.y < pinky_tip.y:
return "Thumbs Up"
elif thumb_tip.y > index_tip.y and middle_tip.y > ring_tip.y and ring_tip.y > pinky_tip.y:
return "Thumbs Down"
else:
return "No gesture"
- Extracts the coordinates of the fingertips for the thumb, index, middle, ring, and pinky fingers.
- Uses simple conditional logic to recognize gestures.
- Returns "Thumbs Up" if the thumb tip is above the index tip and the middle tip is above the ring tip.
- Returns "Thumbs Down" if the thumb tip is below the index tip and the middle tip is below the ring tip.
- Returns "No gesture" for any other configuration.
hands = mp_hands.Hands(static_image_mode=False, max_num_hands=2, min_detection_confidence=0.8, min_tracking_confidence=0.8)
- Initializes the MediaPipe hands model with specific parameters:
static_image_mode=False
: Uses the model for video streams.max_num_hands=2
: Detects up to two hands.min_detection_confidence=0.8
: Sets the minimum confidence for the detection.min_tracking_confidence=0.8
: Sets the minimum confidence for tracking.
cap = cv2.VideoCapture(0)
- Initializes the webcam for capturing video.
while cap.isOpened():
ret, frame = cap.read()
if not ret:
print("Failed to capture image from camera.")
break
frame = cv2.flip(frame, 1) # Flip the frame horizontally
frame = detect_hands(frame)
cv2.imshow('MediaPipe Hand Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
- Continuously captures frames from the webcam.
- Checks if the frame is captured successfully.
- Flips the frame horizontally for a mirror effect.
- Calls
detect_hands
to detect hands, recognize gestures, and draw landmarks. - Displays the processed frame.
- Exits the loop if the 'q' key is pressed.
cap.release()
cv2.destroyAllWindows()
hands.close()
- Releases the webcam.
- Closes all OpenCV windows.
- Releases the MediaPipe hands model.
graph TD;
A[Start] --> B[Initialize MediaPipe Hands];
B --> C[Initialize webcam];
C --> D[Read frame from webcam];
D --> E[Flip frame horizontally];
E --> F[Detect Hands];
F --> G[Recognize Gestures];
G --> H[Draw Landmarks];
H --> I[Display Frame];
I --> J[Check for 'q' key press];
J --> K[Release webcam];
K --> L[Close OpenCV windows];
L --> M[Close MediaPipe hands model];
graph TD;
A[Start] --> B[Initialize MediaPipe Hands];
B --> C[Initialize webcam];
C --> D[Read frame from webcam];
D --> E[Flip frame horizontally];
E --> F[Detect Hands];
F --> G[Recognize Gestures];
G --> H[Draw Landmarks];
H --> I[Display Frame];
I --> J[Check for 'q' key press];
J --> K[Release webcam];
K --> L[Close OpenCV windows];
L --> M[Close MediaPipe hands model];