/COO-Comic-Onomatopoeia

COO: Comic onomatopoeia dataset (ECCV 2022)

Primary LanguagePython

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Official repository of COO (ECCV 2022) | Paper | Dataset, Sample | Codes | Leaderboard | Poster | Video

We provide the COmic Onomatopoeia dataset (COO) and the source codes used in our paper.

  1. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each part is a truncated text and is not meaningful by itself. These parts should be linked to represent the intended meaning. Thus, we propose a novel task that predicts the link between truncated texts.

  2. COO is a challenging text dataset. Detecting the onomatopoeia region and capturing the intended meaning of truncated texts are very difficult. If a model can recognize comic onomatopoeias, we expect that the model can also recognize other less difficult texts. We hope our work will encourage studies on more irregular texts and further improve text detection, recognition, and link prediction methods.

  3. In text detection and recognition tasks, researchers use various datasets to validate their methods. We hope COO can also help validate their methods. In addition, we hope COO is used to analyze Japanese onomatopoeia.


Dataset: Comic Onomatopoeia (COO)

We provide the annotations of the COO.
Several files that help preprocessing, visualization, and data analysis are in COO-data folder.
COO has 61,465 polygons and 2,261 links between truncated texts.
The below figure shows the COO statistics and character types of COO (182 types in total).

Prerequisites: Download Manga109 images

According to the license of Manga109, the redistribution of the images of Manga109 is not permitted.
Thus, you should download the images of Manga109 via the Manga109 webpage.

After downloading, unzip Manga109.zip and then move images folder of Manga109 into COO-data folder.
We need images folder in COO-data folder (i.e. COO-data/images) for preprocessing.

Preprocessing for each model

  1. Run the following command.

    pip install Flask==2.0.2 Shapely==1.8.0 manga109api==0.3.1 pillow natsort lmdb opencv-python numpy tqdm
    
  2. See the section dataset in each model folder.


Codes

The source codes used in our paper are provided in each folder.
For text detection, we used ABCNetv2 and MTSv3.
For text recognition, we used TRBA.
For link prediction, we used M4C-COO (a variant of M4C).


Leaderboard

We will list the results of SOTA methods that provide the official code.
For the leaderboard, we report the performance of one pretrained model.
Note that we report the average value of three trials in our paper.
We welcome the pull requests containing an official code (URL) of other SOTA methods.

Text detection

  • P: Precision    R: Recall    H: Hmean    *: only use detection part.
  • Based on the hmean, methods are sorted in descending order.
Method P R H Official Code Pretrained model
DBNet++ (TPAMI 2022) 90.8 60.9 72.9 URL download
DBNet (AAAI 2020) 90.9 60.3 72.5 URL download
PAN (ICCV 2019) 88.4 58.6 70.4 URL download
PAN++* (TPAMI 2021) 78.3 62.7 69.7 URL download
MTSv3* (ECCV 2020) 70.1 66.0 68.0 URL download
PSENet (CVPR 2019) 83.3 57.1 67.8 URL download
ABCNetv2* (TPAMI 2021) 67.2 65.1 66.1 URL download

Text recognition

  • Based on the accuracy, methods are sorted in descending order.
Method Accuracy Official Code Pretrained model
TRBA+2D (ours) 81.2 URL download
MASTER (PR 2021) 74.6 URL download
ABINet w/o pretrain (CVPR 2021) 70.6 URL download

Link prediction

  • P: Precision    R: Recall    H: Hmean
  • Based on the hmean, methods are sorted in descending order.
Method P R H Official Code Pretrained model
M4C-COO (ours) 74.5 66.3 70.2 URL download
M4C-COO with vocab 11640 (ours) 59.4 44.6 51.0 URL download
Distance-based rule (ours) 1.1 74.5 2.1 - -

Citation

When using annotations of comic onomatopoeia dataset (COO) or if you find this work useful for your research, please cite our paper.

@inproceedings{baek2022COO,
  title={COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts},
  author={Baek, Jeonghun and Matsui, Yusuke and Aizawa, Kiyoharu},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}

Contact

Feel free to contact us if there is any question: Jeonghun Baek ku21fang@gmail.com

License

For the dataset, annotation data of COO is licensed under a CC BY 4.0.
The license of image data of Manga109 is described here.

For the codes made by us: MIT.
After examining the licenses of original source codes of each method used in our work, we found that the redistribution of source codes is permitted. Thus, to facilitate future work, we provide the source codes in this repository. Please let us know if there is a license issue with code redistribution. If so, we will remove the code and provide the instructions to reproduce our work.