/TESTR

(CVPR 2022) Text Spotting Transformers

Primary LanguagePythonApache License 2.0Apache-2.0

TESTR: Text Spotting Transformers

This repository is the official implementations for the following paper:

Text Spotting Transformers

Xiang Zhang, Yongwen Su, Subarna Tripathi, and Zhuowen Tu, CVPR 2022

Getting Started

We use the following environment in our experiments. It's recommended to install the dependencies via Anaconda

  • CUDA 11.3
  • Python 3.8
  • PyTorch 1.10.1
  • Official Pre-Built Detectron2

Installation

Please refer to the Installation section of AdelaiDet: README.md.

If you have not installed Detectron2, following the official guide: INSTALL.md.

After that, build this repository with

python setup.py build develop

Preparing Datasets

Please download TotalText, CTW1500, MLT, and CurvedSynText150k according to the guide provided by AdelaiDet: README.md.

ICDAR2015 dataset can be download via link.

Extract all the datasets and make sure you organize them as follows

- datasets
  | - CTW1500
  |   | - annotations
  |   | - ctwtest_text_image
  |   | - ctwtrain_text_image
  | - totaltext (or icdar2015)
  |   | - test_images
  |   | - train_images
  |   | - test.json
  |   | - train.json
  | - mlt2017 (or syntext1, syntext2)
      | - annotations
      | - images

After that, download polygonal annotations, along with evaluation files and extract them under datasets folder.

Visualization Demo

You can try to visualize the predictions of the network using the following command:

python demo/demo.py --config-file <PATH_TO_CONFIG_FILE> --input <FOLDER_TO_INTPUT_IMAGES> --output <OUTPUT_FOLDER> --opts MODEL.WEIGHTS <PATH_TO_MODEL_FILE> MODEL.TRANSFORMER.INFERENCE_TH_TEST 0.3

You may want to adjust INFERENCE_TH_TEST to filter out predictions with lower scores.

Training

You can train from scratch or finetune the model by putting pretrained weights in weights folder.

Example commands:

python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --num-gpus 8

All configuration files can be found in configs/TESTR, excluding those files named Base-xxxx.yaml.

TESTR_R_50.yaml is the config for TESTR-Bezier, while TESTR_R_50_Polygon.yaml is for TESTR-Polygon.

Evaluation

python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --eval-only MODEL.WEIGHTS <PATH_TO_MODEL_FILE>

Pretrained Models

Dataset Annotation Type Lexicon Det-P Det-R Det-F E2E-P E2E-R E2E-F Link
Pretrain Bezier None 88.87 76.47 82.20 63.58 56.92 60.06 OneDrive
Polygonal None 88.18 77.51 82.50 66.19 61.14 63.57 OneDrive
TotalText Bezier None 92.83 83.65 88.00 74.26 69.05 71.56 OneDrive
Full - - - 86.42 80.35 83.28
Polygonal None 93.36 81.35 86.94 76.85 69.98 73.25 OneDrive
Full - - - 88.00 80.13 83.88
CTW1500 Bezier None 89.71 83.07 86.27 55.44 51.34 53.31 OneDrive
Full - - - 83.05 76.90 79.85
Polygonal None 92.04 82.63 87.08 59.14 53.09 55.95 OneDrive
Full - - - 86.16 77.34 81.51
ICDAR15 Polygonal None 90.31 89.70 90.00 65.49 65.05 65.27 OneDrive
Strong - - - 87.11 83.29 85.16
Weak - - - 80.36 78.38 79.36
Generic - - - 73.82 73.33 73.57

The Lite models only use the image feature from the last stage of ResNet.

Method Annotation Type Lexicon Det-P Det-R Det-F E2E-P E2E-R E2E-F Link
Pretrain (Lite) Polygonal None 90.28 72.58 80.47 59.49 50.22 54.46 OneDrive
TotalText (Lite) Polygonal None 92.16 79.09 85.12 66.42 59.06 62.52 OneDrive

Citation

@InProceedings{Zhang_2022_CVPR,
    author    = {Zhang, Xiang and Su, Yongwen and Tripathi, Subarna and Tu, Zhuowen},
    title     = {Text Spotting Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {9519-9528}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgement

Thanks to AdelaiDet for a standardized training and inference framework, and Deformable-DETR for the implementation of multi-scale deformable cross-attention.