Awesome-Text-VQA

Introduction

TextVQA is a fine-grained direction of the VQA task, which aims to read the text in the image and answer questions by reasoning about the text and visual content.

Note: In the TextVQA task, here are some of the reference materials in the process of my own research.

Chanlleng

ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering[overview][result]

Papers

2021

( AAAI )Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps. [paper](3-Att-Blok)

2020

( CVPR )Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA. [paper][code](M4C)
( ACM MM )Cascade Reasoning Network for Text-basedVisual Question Answering. [paper][code](CRN)
( CVPR )Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. [paper][code](MM-GNN)
( ECCV )Spatially Aware Multimodal Transformers for TextVQA. [paper][code](SA-M4C)
( COLING )Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering. [paper](LaAP)
( CVPR )On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering. [paper](Dataset:EST-VQA/Model:QA R-CNN)
( arxiv )Multimodal grid features and cell pointers for Scene Text Visual Question Answering. [paper]
( Report )Structured Multimodal Attentions for TextVQA. [paper](SMA)
( Report ) TAP: Text-Aware Pre-training for Text-VQA and Text-Caption. [paper](TAP)

2019

( CVPR )Towards VQA Models That Can Read. [paper][code] -(LoRRA/Dataset:TextVQA)
( ICCV ) Scene Text Visual Question Answering. [paper] -(Dataset:ST-VQA)
( ICCV )From Strings to Things: Knowledge-enabled VQA Model that can Read and Reason. [paper](K-VQA)
( ICDAR )OCR-VQA: Visual Question Answering by Reading Text in Images. [paper] -(Dataset:OCR-VQA)