An OCR Toolbox for Vietnamese Documents

This toolbox provides a pipeline to do OCR in Vietnamese documents (such as receipts, personal id, licenses,...). The project also support flexibility for adaptation.

📑 More infomation:

Report: link
Youtube:

Invoice (from SROIE19 dataset)

Personal ID (image from internet)

Pipeline in detail:

Use Canny Edge Detector and then detect contours.
Extract receipt from image and normalize.
Use Pixel Agreation Network (PAN) to detect text regions from extracted receipt, then crop these regions.
Use VietOCR to extract texts from regions, then perform word correction.
Retrieve information

Notebooks

Notebook for training PAN:
Notebook for training Transformer OCR:
Notebook for training PhoBERT:
Notebook for inference:

Pipeline

Main Pipeline

Process Flow Block

There are two stages (can also run in second stage only):

The first stage is to detect and rectify document in the image, then forward through the "process flow" to find the best orientation of the document.
The second stage is to forward the rotated image through the entire "process flow" normally to retrieve information

Datasets

MCOCR-2020 (for detection)
SROIE19 (for ocr and retrieval)

Pretrained weights

Pretrained PAN weights on SROIE19:

Model	Image Size	Weights	MAP@0.5	Pixel accuracy	IOU
PAN (baseline)	640 x 640	link	0.71	0.95	0.91
PAN (rotation)	640 x 640	link	0.66	0.93	0.88

Pretrained OCR weights on MCOCR2021:

Model	Weights	Accuracy (full seq)	Accuracy (per char)
Transformer OCR	link	0.890	0.981

Pretrained PhoBERT weights on MCOCR2021:

Model	Weights	Accuracy (train)	Accuracy (val)
PhoBERT	link	0.978	0.924

Inference

Install dependencies pip install -r requirements.txt
Full pipeline:

python run.py --input=<input image> --output=<output folder>

Extra Parameters:
- --debug: whether to save output of each step
- --find_best_rotation: whether to find best rotation first
- --do_retrieve: whether to retrieve information (based on class defined in config) or ocr only

beta21s/vietnamese-ocr-toolbox