/MFCNN

Primary LanguagePython

Research on Information Extraction based on Semantic Segmentation in OCR

Research Documentation

Based on Rules

Aset of general rules for Information Extraction out of forms is designed and implemented independently.
The accuracy rate of key-value pairs on more than 100 customs declaration invoices reaches 98%.

Based on Neural Network

MFCNN based on BERT is reproduced. A series of experiments are carried out in terms of training set requirements,
training cost, model generalization, finetune for downstream tasks and parameter tuning.

Based on Machine Learning

The experiment of LightGBM Feature Engineering makes the F1 score of one shot learning exceed 0.9.

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Paper
Paper Documentation

Model

  • Link
  • 下载好的模型应放置于 mfcn 文件夹下的 models 文件夹内

Auxiliary Files for PCA Dimension Reduction

  • Link
  • 下载好的数据降维辅助文件应放置于 bert 文件夹下的 auxiliary_768 文件夹内