Awesome-Text-VQA
Introduction
TextVQA is a fine-grained direction of the VQA task, which aims to read the text in the image and answer questions by reasoning about the text and visual content.
Note: In the TextVQA task, here are some of the reference materials in the process of my own research.
Chanlleng
ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering[overview][result]
Papers
2021
- ( AAAI )Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps. [paper](3-Att-Blok)
2020
- ( CVPR )Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA. [paper][code](M4C)
- ( ACM MM )Cascade Reasoning Network for Text-basedVisual Question Answering. [paper][code](CRN)
- ( CVPR )Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. [paper][code](MM-GNN)
- ( ECCV )Spatially Aware Multimodal Transformers for TextVQA. [paper][code](SA-M4C)
- ( COLING )Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering. [paper](LaAP)
- ( CVPR )On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering. [paper](Dataset:EST-VQA/Model:QA R-CNN)
- ( arxiv )Multimodal grid features and cell pointers for Scene Text Visual Question Answering. [paper]
- ( Report )Structured Multimodal Attentions for TextVQA. [paper](SMA)
- ( Report ) TAP: Text-Aware Pre-training for Text-VQA and Text-Caption. [paper](TAP)
2019
- ( CVPR )Towards VQA Models That Can Read. [paper][code] -(LoRRA/Dataset:TextVQA)
- ( ICCV ) Scene Text Visual Question Answering. [paper] -(Dataset:ST-VQA)
- ( ICCV )From Strings to Things: Knowledge-enabled VQA Model that can Read and Reason. [paper](K-VQA)
- ( ICDAR )OCR-VQA: Visual Question Answering by Reading Text in Images. [paper] -(Dataset:OCR-VQA)