This is the project repository for 'Bidirectional Scene Text Recognition with a Single Decoder', by Maurits Bleeker and Maarten de Rijke
https://arxiv.org/abs/1912.03656
The base source-code of this project comes from: http://nlp.seas.harvard.edu/2018/04/03/attention.html
I tried to keep te code as general as possible. But some elements of the pipeline are specially written for the environment I worked with.
To reproduce the results of the paper, please use the final model parameters.
https://drive.google.com/file/d/1OwJ3iVpRhnjIZyOi7aOQIeLv7N1DHZkC/view?usp=sharing
In data_utils/ I provided all the scripts to generate the train and test sets as used for this paper.
Python 3.7 Pillow 5.4.1 7.0.0 nltk 3.4.5 3.4.5 numpy 1.17.1 1.18.1 scipy 1.2.0 1.4.1 seaborn 0.9.0 0.9.0 tensorboard-logger 0.1.0 tensorboardX 1.7 2.0 torch 1.1.0.post2 1.3.1 torchvision 0.2.1 0.4.2 transformers 2.1.1 2.3.0
To run the code, just run main.py
, and set all the condifurations in the Config.py. The configutations to reproduce the results are in the Config.py
.
There are two options to load the training/test data
- From disk. This can be done by using the annotation file(s).
- From a pickle file. The pickle file should contain a python dict with the following data format.
{
image_id : {
'data' : 'binary image string',
'label' : 'word'
}
}
The annotations files are formatted as 'path/to/image.jpg annotation'. The path to image is always relative to a root folder.
Example root folder: User/Documents/Project/data/IIITK/
In User/Documents/Project/data/IIITK/, we have an annotation.txt and the images.
An example of annotation the file:
test/1002_1.png private
All the files to process the the original provided datasets are given in /data_utils.
If you found this code useful, please cite the following paper:
@article{bleeker2019bidirectional,
title={Bidirectional Scene Text Recognition with a Single Decoder},
author={Bleeker, Maurits and de Rijke, Maarten},
journal={arXiv preprint arXiv:1912.03656},
year={2019}
}