UTRNet: High-Resolution Urdu Text Recognition

Official Implementation of the paper "UTRNet: High-Resolution Urdu Text Recognition In Printed Documents"

Using This Repository

Environment

Python 3.7
Pytorch 1.9.1+cu111
Torchvision 0.10.1+cu111
CUDA 11.4

Installation

Clone the repository

git clone https://github.com/abdur75648/UTRNet-High-Resolution-Urdu-Text-Recognition.git

Install the requirements

conda create -n urdu_ocr python=3.7
conda activate urdu_ocr
pip3 install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Running the code

Training

python3 train.py --train_data path/to/LMDB/data/folder/train/ --valid_data path/to/LMDB/data/folder/val/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --exp_name UTRNet-Large --num_epochs 100 --batch_size 8

Testing

CUDA_VISIBLE_DEVICES=0 python test.py --eval_data path/to/LMDB/data/folder/test/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --saved_model saved_models/UTRNet-Large/best_norm_ED.pth

Character-wise Accuracy Testing

To create character-wise accuracy table in a CSV file, run the following command

CUDA_VISIBLE_DEVICES=0 python3 char_test.py --eval_data path/to/LMDB/data/folder/test/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC  --saved_model saved_models/UTRNet-Large/best_norm_ED.pth

Visualize the result by running char_test_vis

Reading individual images

To read a single image, run the following command

CUDA_VISIBLE_DEVICES=0 python3 read.py --image_path path/to/image.png --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC  --saved_model saved_models/UTRNet-Large/best_norm_ED.pth

Visualisation of Salency Maps

To visualize the salency maps for an input image, run the following command

python3 vis_salency.py --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --saved_model saved_models/UTRNet-Large/best_norm_ED.pth --vis_dir vis_feature_maps --image_path path/to/image.pngE

Dataset

Create your own lmdb dataset

pip3 install fire
python create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/train

The structure of data folder as below.

data
├── gt.txt
└── test
    ├── word_1.png
    ├── word_2.png
    ├── word_3.png
    └── ...

At this time, gt.txt should be {imagepath}\t{label}\n
For example

test/word_1.png label1
test/word_2.png label2
test/word_3.png label3
...

Downloads

Trained Models

Datasets

UTRSet-Real
UTRSet-Synth
IIITH (Updated) (Original)
UPTI (Source)
UrduDoc - Will be made available subject to the execution of a no-cost license agreement. Please contact the authors for the same.

Text Detection (Supplementary)

The text detection inference code & model based on ContourNet is here. As mentioned in the paper, it may be integrated with UTRNet for a combined text detection+recognition and hence an end-to-end Urdu OCR.

End-To-End Urdu OCR Webtool

This was developed by integrating the UTRNet text recognition model with a text detection model for end-to-end Urdu OCR. The webtool may be made available only for non-commercial use upon request. Please contact the authors for the same.

Updates

01/01/21 - Project Initiated
21/11/22 - Abstract accepted at WIDAFIL-ICFHR 2022
12/12/22 - Repository Created
20/12/22 - Results Updated
19/04/23 - Paper accepted at ICDAR 2023

Acknowledgements

This repository is based on deep-text-recognition-benchmark
We acknowledge the Rekhta Foundation and the personal collections of Arjumand Ara for the scanned images and Noor Fatima and Mohammed Usman for the dataset and the manual transcription.
We also thank all the members of the Computer Vision Group, IIT Delhi for their support and guidance.

Contact

Note

This is an official repository of the project. The copyright of the dataset, code & models belongs to the authors. They are for research purposes only and must not be used for any other purpose without the author's explicit permission.

Citation

If you use the code/dataset, please cite the following paper:

@article{rahman2023utrnet,
      title={UTRNet: High-Resolution Urdu Text Recognition In Printed Documents}, 
      author={Abdur Rahman and Arjun Ghosh and Chetan Arora},
      journal={arXiv preprint arXiv:2306.15782},
      year={2023},
      eprint={2306.15782},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      doi = {https://doi.org/10.48550/arXiv.2306.15782},
      url = {https://arxiv.org/abs/2306.15782}
}

License

. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License for Noncommercial (academic & research) purposes only and must not be used for any other purpose without the author's explicit permission.

ShashankKrishnaV/UTRNet-High-Resolution-Urdu-Text-Recognition