Awesome-Text-VQA

Introduction

TextVQA is a fine-grained direction of the VQA task, which aims to read the text in the image and answer questions by reasoning about the text and visual content.

Note: In the TextVQA task, here are some of the reference materials in the process of my own research.

Chanlleng

ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering[overview][result]

Papers

2021

  • ( AAAI )Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps. [paper](3-Att-Blok)

2020

  • ( CVPR )Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA. [paper][code](M4C)
  • ( ACM MM )Cascade Reasoning Network for Text-basedVisual Question Answering. [paper][code](CRN)
  • ( CVPR )Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. [paper][code](MM-GNN)
  • ( ECCV )Spatially Aware Multimodal Transformers for TextVQA. [paper][code](SA-M4C)
  • ( COLING )Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering. [paper](LaAP)
  • ( CVPR )On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering. [paper](Dataset:EST-VQA/Model:QA R-CNN)
  • ( arxiv )Multimodal grid features and cell pointers for Scene Text Visual Question Answering. [paper]
  • ( Report )Structured Multimodal Attentions for TextVQA. [paper](SMA)
  • ( Report ) TAP: Text-Aware Pre-training for Text-VQA and Text-Caption. [paper](TAP)

2019

  • ( CVPR )Towards VQA Models That Can Read. [paper][code] -(LoRRA/Dataset:TextVQA)
  • ( ICCV ) Scene Text Visual Question Answering. [paper] -(Dataset:ST-VQA)
  • ( ICCV )From Strings to Things: Knowledge-enabled VQA Model that can Read and Reason. [paper](K-VQA)
  • ( ICDAR )OCR-VQA: Visual Question Answering by Reading Text in Images. [paper] -(Dataset:OCR-VQA)