YOLO-anime-hands

Example with YOLOv8x:

A model that is trained on only gwerns data with seems to struggle with gloves, handshakes and more complex hands. I thus added some custom data. I also tried to train nano and medium sized yolo, but that resulted in models with severe accuracy problems.

There is also adetailer which has multiple models for this task, but these models usually have low conficence detections in drawn images which are sometimes below 50% and are prone to misdetection.

Training code reference:

from ultralytics import YOLO

model = YOLO('model.pt')

# training will abort early due to early stopping
results = model.train(data='coco128.yaml', epochs=10000, imgsz=640, batch=20, amp=True)

Usage example:

from PIL import Image
import cv2
from ultralytics import YOLO

model = YOLO('model.pt')

results = model('test.jpg') # conf=0.5)

for r in results:
    im_array = r.plot()
    im = Image.fromarray(im_array[..., ::-1])
    img = cv2.cvtColor(im_array[..., ::-1], cv2.COLOR_BGR2RGB)
    cv2.imwrite("test_output.jpg", img)

To process a folder with images:

import os
from PIL import Image
import cv2
from ultralytics import YOLO
from tqdm import tqdm

model = YOLO('model.pt')

input_folder = '/'
output_folder = '/'

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for file_name in os.listdir(input_folder):
    if file_name.endswith(('.jpg', '.png', '.webp')):
        image_path = os.path.join(input_folder, file_name)
        results = model(image_path)
        output_path = os.path.join(output_folder, file_name)

        for r in results:
            im_array = r.plot()
            im = Image.fromarray(im_array[..., ::-1])
            img = cv2.cvtColor(im_array[..., ::-1], cv2.COLOR_BGR2RGB)
            cv2.imwrite(output_path, img)

Graphs

Training with a 4090 and Prodigy optimizer set to 1. Using ultralytics/ultralytics commit db2af70d3910f168a62ecaae4d920e1440f08c7e because newer versions seem to have converging problems and train much slower. May be due to unsuitable defaults.

YOLOv8x_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

992 epochs
batch 30 (?)
dataset:
- gwern (5371 images)

YOLOv8x_*_finetuned: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

used gwern trained YOLOv8x as pretrain
704 epochs
batch 30 (?)
dataset:
- own custom data (924 images)

YOLOv9e_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

1181 epochs
53.533 hours
batch 14
dataset:
- gwern (5371 images)

YOLOv9e_*_finetuned: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

used gwern trained YOLOv9e as pretrain
725 epochs
6.852 hours
batch 14
dataset:
- own custom data (1069 images)

YOLOv9e_gwern+own_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

1177 epochs
56.442 hours
batch 14
dataset (6440 images):
- gwern (5371 images)
- own data (1069 images)

YOLOv9e_all_*: best last best onnx best dynamic onnx last onnx last dynamic onnx csv

1166 epochs
172.749 hours (~7.2 days)
batch 14
dataset (17392 images):
- gwern (5371 images)
- own data (1069 images)
- 1-yshhi/anhdet (5705 images)
- catwithawand/hand-detection-fuao9 (5247 images)