bubble detector using YOLOv4

Note : It's not the final version code. I will the refine and update the code.

Overview

Models detection speech bubble in webtoons or cartoons. I have referenced and implemented pytorch-YOLOv4 to detect speech bubble. The key point for improving performance is data analysis. In the case of speech bubbles, there are various forms. Therefore, I define the form of speech bubbles and present the results of training by considering the distribution of data.


Definition of Speech Bubble

Various speech bubble forms of real webtoons

image

  • In fact, there are various colors and various shapes of speech bubbles in webtoons.

New Definition

Key standard for Data Definition: Shape, Color, Form

standard

  • shape : Ellipse(tawon), Thorn(gasi), Sea_urchin(seonggye), Rectangle(sagak), Cloud(gurm)

  • Color : Black/white(bw), Colorful(color), Transparency(tran), Gradation

  • Form : Basic, Double Speech bubble, Multi-External, Scatter-type

  • example image image

  • In this project, two categories are applied, shape and color, and form and Gradation are classified as ect.


classes

This class is not about detection, but about speech bubble data distribution.

image


Install dependencies

  • Pytorch Version

    • Pytorch 1.4.0 for TensorRT 7.0 and higher
    • Pytorch 1.5.0 and 1.6.0 for TensorRT 7.1.2 and higher
  • Install Dependencies Code

    pip install onnxruntime numpy torch tensorboardX scikit_image tqdm easydict Pillow skimage opencv_python pycocotools
    

    or

    pip install -r requirements.txt
    

Pretrained model

Model Link
YOLOv4 Link
YOLOv4-bubble Link

Train

  • 1. Download weight

  • 2. Train

    python train.py -g gpu_id -classes number of classes  -dir 'data_dir' -pretrained 'pretrained_model.pth'
    

    or

    Train.sh 
    
  • 3. Config setting

    • cfg.py

      • class = 1
      • learning_rate = 0.001
      • max_batches = 2000 (class * 2000)
      • steps = [1600, 1800], (max_batches * 0.8 , max_batches * 0.9)
      • train_dir = your dataset root
        • root tree
          image
          The image folder contains .jpg or .png image files. The XML folder contains .XML files(label).
    • cfg/yolov4.cfg

      • class 1
      • filter 18 (4 + 1 + class) * 3 (line: 961, 1049, 1137)

If you want to train custom dataset, use the information above.


Demo

  • 1. Download weight
  • 2. Demo
    python demp.py -cfgfile cfgfile -weightfile pretrained_model.pth -imgfile image_dir 
    
    • defualt cfgfile is ./cfg/yolov4.cfg

Metric

  • 1. validation dataset
tawon_bw tawon_color tawon_Transparency gasi_bw gasi_color gasi_Transparency seonggye_bw seonggye_color seonggye_Transparency sagak_bw sagak_color sagak_Transparency gurm_bw gurm_color gurm_Transparency total
116 70 68 65 29 59 51 43 44 42 33 69 47 2 12 750
  • The above distribution is based on speech bubbles, not cuts.
  • The distribution is not constant because there are a number of speech bubbles inside a single cut. In addition, for some classes, examples are difficult to find, resulting in an unbalanced distribution as shown above.