Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.
If you use ssbaseline in your work, please cite:
@article{zhu2020simple,
title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
journal={arXiv preprint arXiv:2012.05153},
year={2020}
}
First install the repo using
git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop
We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.
Datasets | ImDBs | Object Faster R-CNN Features | OCR Faster R-CNN Features | OCR Recog-CNN Features |
---|---|---|---|---|
TextVQA | TextVQA ImDB | Open Images | TextVQA SBD-Trans OCRs | TextVQA SBD-Trans OCRs |
ST-VQA | ST-VQA ImDB | ST-VQA Objects | ST-VQA SBD-Trans OCRs | ST-VQA SBD-Trans OCRs |
We release the following pretrained models for ssbaseline on TextVQA.
For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.
Datasets | Config Files (under configs/vqa/ ) |
Pretrained Models | Metrics | Notes |
---|---|---|---|---|
TextVQA (m4c_textvqa ) |
m4c_textvqa/m4c_sbd.yml (need to modify: add data imdb and feature files of stvqa, see m4c_with_stvqa.yml for reference) |
ssbaseline_with_stvqa |
val accuracy - 45.53%; test accuracy - 45.66% | SBD-Trans OCRs; ST-VQA as additional data |
Please follow the M4C README for the training and evaluation of the M4C model on each dataset.
Question: Feature Extraction(文章中各部分feature提取的代码有开源吗,因为要用在一些别的数据上希望可以自己提取特征)
Answer: There are various features, and their corresponding repositories are shown below: (各部分feature提取的代码比较多,我把我用到的给你说一下:)
- To get the feature from OCR bounding box, you need to modify the maskrcnn detection framework by replacing the RPN layer with the hardcoded bounding box. There is a repo, and you should use it together with the feature extraction script.
- 提取ocr bounding box中的feature,这种需要修改mask rcnn检测框架,把RPN层替换成bounding box,我使用的是这个repo中的代码,需要配合提取feature的脚本使用。
- To get the feature from OBJ bounding box, you don't need modify maskrcnn framework this time, which is this repo. The corresponding extraction script.
- 提取obj faster rcnn feature,这个不需要修改检测框架,直接提取就好,检测框架,脚本
- To get the OCR bounding box, we use this repo, and the model we used is MLT 2017.
- 获得ocr检测框的代码,使用的模型是MLT 2017。
- Based on the OCR bounding box, to get the OCR recognition result & extract features, the code is not mine and not opensourced yet.
- 基于ocr检测框获得文本识别结果 & 提取ocr Recog-CNN feature,这个文本识别的代码不是我写的,也没有开源,所以目前没法分享给你