It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022. Test in Python3.7.
pip install -r requirements.txt
Please convert your own dataset to LMDB format by create_dataset.py. (Borrowed from https://github.com/bgshih/crnn/blob/master/tool/create_dataset.py, provided by Baoguang Shi)
For labeled dataset, there are converted Synth90K and SynthText LMDB dataset by luyang-NWPU: [Here], password: tw3x
For unlabeled dataset, you could download raw images from imagenet, places2 and openimages, then detect and crop word images from these images.
sh run_baseline.sh
sh run_semi.sh
sh run_test.sh
download pretrained model from here(https://pan.baidu.com/s/1JF97VY0oiPiK5GpDEsYioQ?pwd=abhr, password:abhr) and put them in saved_models.
Model | Labeled data | Unlabeled data | IC13 857 | IC13 1015 | SVT | IIIT | IC15 1811 | IC15 2077 | SVTP | CUTE |
---|---|---|---|---|---|---|---|---|---|---|
TRBA_pr | 10% (Synth90K+SynthText) | - | 96.3 | 94.3 | 91.5 | 94.3 | 81.5 | 77.7 | 84.2 | 87.5 |
TRBA_cr | 10% (Synth90K+SynthText) | 1.06M unlabeled real data | 97.2 | 95.9 | 94.3 | 96.6 | 86.8 | 79.7 | 89.0 | 93.4 |
TRBA_pr | Synth90K + SynthText | - | 97.2 | 95.9 | 91.8 | 95.5 | 83.6 | 79.7 | 87.3 | 88.5 |
TRBA_cr | Synth90K + SynthText | 10.6M unlabeled real data | 98.0 | 96.4 | 96.0 | 97.0 | 88.8 | 84.9 | 90.9 | 95.1 |
This code is based on STR-Fewer-Labels by Jeonghun Baek. Thanks for your contribution.