visual-question-answering

There are 192 repositories under visual-question-answering topic.

salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language:Jupyter Notebook5k 33 202660
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language:Python2.4k 21 364248
peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language:Jupyter Notebook1.4k 26 118378
lucidrains/flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
Language:Python1.2k 21 1359
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python1k 36 62111
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
621 18 462
jnhwkim/ban-vqa
Bilinear attention networks for visual question answering
Language:Python545 14 45100
MILVLG/mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Language:Python448 6 3888
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python377 4 4130
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
364 7 019
davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Language:Jupyter Notebook348 15 1474
MILVLG/openvqa
A lightweight, scalable, and general framework for visual question answering research
Language:Python321 12 2964
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language:Python271 3 4127
lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Language:Jupyter Notebook261 6 2640
Cyanogenoid/pytorch-vqa
Strong baseline for visual question answering
Language:Python239 9 2197
HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
210 2 029
qiantianwen/NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Language:Python167 14 101
markdtw/vqa-winner-cvprw-2017
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
Language:Python164 11 738
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language:Python160 1 105
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language:Python156 5 1523
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Language:Python141 3 129
zhegan27/VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Language:Python119 8 1214
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language:Jupyter Notebook118 5 1315
anisha2102/docvqa
Document Visual Question Answering
Language:Python112 5 1525
sdc17/UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Language:Python101 5 165
mesnico/RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
Language:Python87 7 924
showlab/LOVA3
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
Language:Python79 4 02
mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language:Python74 5 239
Shivanshu-Gupta/Visual-Question-Answering
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
Language:Python74 5 619
violetteshev/bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Language:Python71 3 1819
allenai/aokvqa
Official repository for the A-OKVQA dataset
Language:Python68 5 167
China-UK-ZSL/ZS-F-VQA
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
Language:Python67 3 1215
rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language:Python66 3 1018
DenisDsh/VizWiz-VQA-PyTorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Language:Jupyter Notebook60 4 219
ivonajdenkoska/multimodal-meta-learn
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
Language:Python57 6 52
SKTBrain/KVQA
Korean Visual Question Answering
56 3 45

visual-question-answering

salesforce/BLIP

OFA-Sys/OFA

peteanderson80/bottom-up-attention

lucidrains/flamingo-pytorch

YehLi/xmodaler

richard-peng-xia/awesome-multimodal-in-medical-imaging

jnhwkim/ban-vqa

MILVLG/mcan-vqa

MMMU-Benchmark/MMMU

zjukg/KG-MM-Survey

davidmascharka/tbd-nets

MILVLG/openvqa

MILVLG/prophet

lupantech/MathVista

Cyanogenoid/pytorch-vqa

HanXinzi-AI/awesome-computer-vision-resources

qiantianwen/NuScenes-QA

markdtw/vqa-winner-cvprw-2017

MMStar-Benchmark/MMStar

antoyang/FrozenBiLM

Yushi-Hu/tifa

zhegan27/VILLA

antoyang/just-ask

anisha2102/docvqa

sdc17/UPop

mesnico/RelationNetworks-CLEVR

showlab/LOVA3

mlvlab/Flipped-VQA

Shivanshu-Gupta/Visual-Question-Answering

violetteshev/bottom-up-features

allenai/aokvqa

China-UK-ZSL/ZS-F-VQA

rentainhe/TRAR-VQA

DenisDsh/VizWiz-VQA-PyTorch

ivonajdenkoska/multimodal-meta-learn

SKTBrain/KVQA