This toolbox provides a pipeline to do OCR in Vietnamese documents (such as receipts, personal id, licenses,...). The project also support flexibility for adaptation.
📑 More infomation:
- Report: link
- Youtube:
Invoice (from SROIE19 dataset)
Personal ID (image from internet)
Pipeline in detail:
- Use Canny Edge Detector and then detect contours.
- Extract receipt from image and normalize.
- Use Pixel Agreation Network (PAN) to detect text regions from extracted receipt, then crop these regions.
- Use VietOCR to extract texts from regions, then perform word correction.
- Retrieve information
Main Pipeline
Process Flow Block
There are two stages (can also run in second stage only):
- The first stage is to detect and rectify document in the image, then forward through the "process flow" to find the best orientation of the document.
- The second stage is to forward the rotated image through the entire "process flow" normally to retrieve information
- MCOCR-2020 (for detection)
- SROIE19 (for ocr and retrieval)
- Pretrained PAN weights on SROIE19:
Model | Image Size | Weights | MAP@0.5 | Pixel accuracy | IOU |
---|---|---|---|---|---|
PAN (baseline) | 640 x 640 | link | 0.71 | 0.95 | 0.91 |
PAN (rotation) | 640 x 640 | link | 0.66 | 0.93 | 0.88 |
- Pretrained OCR weights on MCOCR2021:
Model | Weights | Accuracy (full seq) | Accuracy (per char) |
---|---|---|---|
Transformer OCR | link | 0.890 | 0.981 |
- Pretrained PhoBERT weights on MCOCR2021:
Model | Weights | Accuracy (train) | Accuracy (val) |
---|---|---|---|
PhoBERT | link | 0.978 | 0.924 |
-
Install dependencies
pip install -r requirements.txt
-
Full pipeline:
python run.py --input=<input image> --output=<output folder>
- Extra Parameters:
- --debug: whether to save output of each step
- --find_best_rotation: whether to find best rotation first
- --do_retrieve: whether to retrieve information (based on class defined in config) or ocr only