A model that is trained on only gwerns data with seems to struggle with gloves, handshakes and more complex hands. I thus added some custom data. I also tried to train nano and medium sized yolo, but that resulted in models with severe accuracy problems.
There is also adetailer which has multiple models for this task, but these models usually have low conficence detections in drawn images which are sometimes below 50% and are prone to misdetection.
Training code reference:
from ultralytics import YOLO
model = YOLO('model.pt')
# training will abort early due to early stopping
results = model.train(data='coco128.yaml', epochs=10000, imgsz=640, batch=20, amp=True)
Usage example:
from PIL import Image
import cv2
from ultralytics import YOLO
model = YOLO('model.pt')
results = model('test.jpg') # conf=0.5)
for r in results:
im_array = r.plot()
im = Image.fromarray(im_array[..., ::-1])
img = cv2.cvtColor(im_array[..., ::-1], cv2.COLOR_BGR2RGB)
cv2.imwrite("test_output.jpg", img)
To process a folder with images:
import os
from PIL import Image
import cv2
from ultralytics import YOLO
from tqdm import tqdm
model = YOLO('model.pt')
input_folder = '/'
output_folder = '/'
if not os.path.exists(output_folder):
os.makedirs(output_folder)
for file_name in os.listdir(input_folder):
if file_name.endswith(('.jpg', '.png', '.webp')):
image_path = os.path.join(input_folder, file_name)
results = model(image_path)
output_path = os.path.join(output_folder, file_name)
for r in results:
im_array = r.plot()
im = Image.fromarray(im_array[..., ::-1])
img = cv2.cvtColor(im_array[..., ::-1], cv2.COLOR_BGR2RGB)
cv2.imwrite(output_path, img)
Training with a 4090 and Prodigy optimizer set to 1. Using ultralytics/ultralytics commit db2af70d3910f168a62ecaae4d920e1440f08c7e
because newer versions seem to have converging problems and train much slower. May be due to unsuitable defaults.
YOLOv8x_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- 992 epochs
- batch 30 (?)
- dataset:
- gwern (5371 images)
YOLOv8x_*_finetuned: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- used gwern trained YOLOv8x as pretrain
- 704 epochs
- batch 30 (?)
- dataset:
- own custom data (924 images)
YOLOv9e_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- 1181 epochs
- 53.533 hours
- batch 14
- dataset:
- gwern (5371 images)
YOLOv9e_*_finetuned: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- used gwern trained YOLOv9e as pretrain
- 725 epochs
- 6.852 hours
- batch 14
- dataset:
- own custom data (1069 images)
YOLOv9e_gwern+own_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- 1177 epochs
- 56.442 hours
- batch 14
- dataset (6440 images):
- gwern (5371 images)
- own data (1069 images)
YOLOv9e_all_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv
- 1166 epochs
- 172.749 hours (~7.2 days)
- batch 14
- dataset (17392 images):
- gwern (5371 images)
- own data (1069 images)
- 1-yshhi/anhdet (5705 images)
- catwithawand/hand-detection-fuao9 (5247 images)
Dataset graphs:
Gwerns dataset (5371 images):
- gwern (5371 images)
My custom dataset (1069 images):
- own data (1069 images)
Gwerns + own data (6440 images):
- gwern (5371 images)
- own data (1069 images)
All combined (17392 images):
- gwern (5371 images)
- own data (1069 images)
- 1-yshhi/anhdet (5705 images)
- catwithawand/hand-detection-fuao9 (5247 images)