CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The official code of CDistNet.

As a different paradigm, we have great confidence in CDistNet's continued high recognition performance across multiple scenarios. To do this, we explore the reasons for ABINet's high performance. The separation of LM and VM gives ABINet even more performance improvements. To this end, we also applied training strategies to CDistNetv2 to find more room for improvement. Be more concerned about CDistNet~

To Do List

Two New Datasets

we test other sota method in HA-IC13 and CA-IC13 datasets.

CDistNet has a performance advantage over other SOTA methods as the character distance increases (1-6)

HA-IC13

Method	1	2	3	4	5	6	Code & Pretrain model
VisionLAN (ICCV 2021)	93.58	92.88	89.97	82.26	72.23	61.03	Offical Code
ABINet (CVPR 2021 )	95.92	95.22	91.95	85.76	73.75	64.99	Offical Code
RobustScanner* (ECCV 2020)	96.15	95.33	93.23	88.91	81.10	71.53	--
Transformer-baseline*	96.27	95.45	92.42	86.46	79.35	72.46	--
CDistNet	96.62	96.15	94.28	89.96	83.43	77.71	--

CA-IC13

Method	1	2	3	4	5	6	Code & Pretrain model
VisionLAN (ICCV 2021)	94.87	92.77	84.01	75.03	64.29	52.74	Offical Code
ABINet (CVPR 2021 )	96.62	95.92	87.86	76.31	65.46	54.49	Offical Code
RobustScanner* (ECCV 2020)	95.22	94.87	85.30	76.55	68.38	60.79	--
Transformer-baseline*	95.68	94.40	85.88	75.85	65.93	58.58	--
CDistNet	96.27	95.57	88.45	79.58	70.36	63.13	--

Datasets

The datasets are same as ABINet

Training datasets
1. MJSynth (MJ):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
2. SynthText (ST):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
Evaluation & Test datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.
1. ICDAR 2013 (IC13)
2. ICDAR 2015 (IC15)
3. IIIT5K Words (IIIT)
4. Street View Text (SVT)
5. Street View Text-Perspective (SVTP)
6. CUTE80 (CUTE)
Augment IC13
- HA-IC13 & CA-IC13 : BaiduNetdisk(passwd:d6jd), GoogleDrive

The structure of dataset directory is

dataset
├── eval
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── train
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST

Environment

package you can find in env_cdistnet.yaml.

#Installed
conda create -n CDistNet python=3.7
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
pip install opencv-python mmcv notebook numpy einops tensorboardX Pillow thop timm tornado tqdm matplotlib lmdb

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:d6jd), GoogleDrive. (We both offer training log and result.csv in same file.) The pretrained model should set in models/reconstruct_CDistNetv3_3_10

Performances of the pretrained models are summaried as follows:

Train

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config=configs/CDistNet_config.py

Eval

CUDA_VISIBLE_DEVICES=0 python eval.py --config=configs/CDistNet_config.py

Citation

@article{Zheng2021CDistNetPM,
  title={CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition},
  author={Tianlun Zheng and Zhineng Chen and Shancheng Fang and Hongtao Xie and Yu-Gang Jiang},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.11011}
}

ashishpapanai/CDistNet-OpenVINO