/vietnamese-ocr-toolbox

A toolbox for Vietnamese Optical Character Recognition.

Primary LanguageC++Apache License 2.0Apache-2.0

An OCR Toolbox for Vietnamese Documents

CodeFactor

This toolbox provides a pipeline to do OCR in Vietnamese documents (such as receipts, personal id, licenses,...). The project also support flexibility for adaptation.

📑 More infomation:

  • Report: link
  • Youtube: Youtube

Invoice (from SROIE19 dataset)

Alt text

Personal ID (image from internet)

Alt text

Pipeline in detail:

  1. Use Canny Edge Detector and then detect contours.
  2. Extract receipt from image and normalize.
  3. Use Pixel Agreation Network (PAN) to detect text regions from extracted receipt, then crop these regions.
  4. Use VietOCR to extract texts from regions, then perform word correction.
  5. Retrieve information

Notebooks

  • Notebook for training PAN: Notebook

  • Notebook for training Transformer OCR: Notebook

  • Notebook for training PhoBERT: Notebook

  • Notebook for inference: Notebook

Pipeline

Main Pipeline

Alt Text

Process Flow Block

Alt Text

There are two stages (can also run in second stage only):

  • The first stage is to detect and rectify document in the image, then forward through the "process flow" to find the best orientation of the document.
  • The second stage is to forward the rotated image through the entire "process flow" normally to retrieve information

Datasets

screen screen screen
screen screen screen

Pretrained weights

  • Pretrained PAN weights on SROIE19:
Model Image Size Weights MAP@0.5 Pixel accuracy IOU
PAN (baseline) 640 x 640 link 0.71 0.95 0.91
PAN (rotation) 640 x 640 link 0.66 0.93 0.88
  • Pretrained OCR weights on MCOCR2021:
Model Weights Accuracy (full seq) Accuracy (per char)
Transformer OCR link 0.890 0.981
  • Pretrained PhoBERT weights on MCOCR2021:
Model Weights Accuracy (train) Accuracy (val)
PhoBERT link 0.978 0.924

Inference

  • Install dependencies pip install -r requirements.txt

  • Full pipeline:

python run.py --input=<input image> --output=<output folder>
  • Extra Parameters:
    • --debug: whether to save output of each step
    • --find_best_rotation: whether to find best rotation first
    • --do_retrieve: whether to retrieve information (based on class defined in config) or ocr only

References