Aset of general rules for Information Extraction out of forms is designed and implemented independently.
The accuracy rate of key-value pairs on more than 100 customs declaration invoices reaches 98%.
MFCNN based on BERT is reproduced. A series of experiments are carried out in terms of training set requirements,
training cost, model generalization, finetune for downstream tasks and parameter tuning.
The experiment of LightGBM Feature Engineering makes the F1 score of one shot learning exceed 0.9.
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks
- Link
- 下载好的模型应放置于 mfcn 文件夹下的 models 文件夹内
- Link
- 下载好的数据降维辅助文件应放置于 bert 文件夹下的 auxiliary_768 文件夹内