📋 This repository contains the data and implementation of ECCV2020 paper Adaptive Text Recognition through Visual Matching
This work addresses the problems of generalization and flexibility for text recognition in documents.
We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic modelling stages through intermediate representations in the form of similarity maps. By doing this, we turn text recognition into a visual matching problem, thereby achieving one-shot sequence recognition.
It can handle challenges that traditional architectures are unable to solve without expensive retraining, including: (i) it can change the number of classes simply by changing the exemplars; and (ii) it can generalize to novel languages and characters (not in the training data) simply by providing a new glyph exemplar set. We also demonstrate that the model can generalize to unseen fonts without requiring new exemplars from them.
- Clone this repository
git clone https://github.com/Chuhanxx/FontAdaptor.git
- Create conda virtual env and install the requirements
(This implementation requires CUDA and python > 3.7)
cd FontAdaptor
source build_venv.sh
- Download data for training and evalutaion.
(The dataset contains FontSynth + Omniglot)
source download_data.sh
- Download our pre-trained model on four font atrributes + Omniglot
Test the model using test fonts as exemplars:
python test.py --evalset FontSynth --root ./data --model_folder /PATH/TO/CHECKPOINT
Test the model using randomly chosen training fonts as exemplars
python test.py --evalset FontSynth --root ./data --model_folder /PATH/TO/CHECKPOINT --cross
Test the model on Omniglot:
python test.py --evalset Omniglot --root ./data --model_folder /PATH/TO/CHECKPOINT
You can visualize the prediction from the model by enable --visualize
Coming soon
Our FontSynth dataset (16GB) can be downloaded directly from here.
We take 1444 fonts from the MJSynth dataset and split them into five categories by their appearance attributes as determined from their names: (1) regular, (2) bold, (3) italic, (4) light, and (5) others (i.e., all fonts with none of the first four attributes in their name)
For train- ing, we select 50 fonts at random from each split and generate 1000 text-line and glyph images for each font. For testing, we use all the 251 fonts in category (5).
The structure of this dataset is:
ims/
font1/
font2/
...
gt/
train/
train_regular_50_resample.txt
test/
val/
test_FontSynth.txt
train_att4.txt
...
fontlib/
googlefontdirectory/
...
In folder gt
, there are txt files with lines in the following format:
font_name img_name gt_sentence (H,W)
For training, it corresponds to an text-line image with path: ims/font_name/lines_byclusters/img_name
For testing, it corresponds to an text-line image with path: ims/font_name/test_new/img_name
gt/train_att4.txt
and gt/train_att4.txt
list the fonts selected for training and testing, source files of these fonts can be found in fontlib
.
If you use this code etc., please cite the following paper:
@inproceedings{zhang2020Adaptive,
title={Adaptive Text Recognition through Visual Matching},
author={Chuhan Zhang, Ankush Gupta and Andrew Zisserman},
booktitle={European Conference on Computer Vision (ECCV)},
year={2020}
}
If you have any question, please contact czhang@robots.ox.ac.uk .