vqa

There are 302 repositories under vqa topic.

facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language:Python5.6k 110 658941
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language:Python3.2k 41 51231
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Language:Python3.1k 12 490509
roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language:Python2.6k 34 45216
BDBC-KG-NLP/QA-Survey-CN
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。
1.8k 40 4262
peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language:Jupyter Notebook1.5k 24 126377
NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language:Python1.3k 16 1973
microsoft/Oscar
Oscar and VinVL
Language:Python1k 25 202250
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Language:Jupyter Notebook870 7 38113
hengyuan-hu/bottom-up-attention-vqa
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Language:Python758 34 48181
Cadene/vqa.pytorch
Visual Question Answering in Pytorch
Language:Python733 32 48179
jayleicn/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Language:Python723 8 6087
jokieleung/awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
666 24 394
XiaomiMiMo/MiMo-VL
MiMo-VL
56027
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python539 7 2839
stanfordnlp/mac-network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Language:Python505 30 43118
vacancy/NSCL-PyTorch-Release
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Language:Python439 18 2296
chingyaoc/awesome-vqa
Visual Q&A reading list
438 45 495
davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Language:Jupyter Notebook348 15 1474
MILVLG/openvqa
A lightweight, scalable, and general framework for visual question answering research
Language:Python327 11 2964
abachaa/Existing-Medical-QA-Datasets
Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
304 12 036
FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language:Python286 13 2415
Cyanogenoid/pytorch-vqa
Strong baseline for visual question answering
Language:Python241 7 21100
FusionBrainLab/OmniFusion
OmniFusion — a multimodal model to communicate using text and images
Language:Python233 5 426
X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Language:Python229 4 2520
shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
226 11 723
linjieli222/VQA_ReGAT
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Language:Python185 6 4138
OatmealLiu/FineR
[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models
Language:Python183 3 126
yuzcccc/vqa-mfb
Language:Python182 7 1240
JackYFL/awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
167 3 114
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language:Python157 4 1520
thaolmk54/hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Language:Python133 7 1926
vztu/VIDEVAL
[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Language:MATLAB131 6 1520
wangleihitcs/Papers
读过的CV方向的一些论文，图像生成文字、弱监督分割等
125 5 120
yuleiniu/cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Language:Python125 2 2213
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language:Jupyter Notebook123 4 1315

vqa

facebookresearch/mmf

OpenGVLab/InternGPT

open-compass/VLMEvalKit

roboflow/maestro

BDBC-KG-NLP/QA-Survey-CN

peteanderson80/bottom-up-attention

NVlabs/prismer

microsoft/Oscar

hila-chefer/Transformer-MM-Explainability

hengyuan-hu/bottom-up-attention-vqa

Cadene/vqa.pytorch

jayleicn/ClipBERT

jokieleung/awesome-visual-question-answering

XiaomiMiMo/MiMo-VL

OpenGVLab/Multi-Modality-Arena

stanfordnlp/mac-network

vacancy/NSCL-PyTorch-Release

chingyaoc/awesome-vqa

davidmascharka/tbd-nets

MILVLG/openvqa

abachaa/Existing-Medical-QA-Datasets

FuxiaoLiu/LRV-Instruction

Cyanogenoid/pytorch-vqa

FusionBrainLab/OmniFusion

X-PLUG/mPLUG-2

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics

linjieli222/VQA_ReGAT

OatmealLiu/FineR

yuzcccc/vqa-mfb

JackYFL/awesome-VLLMs

antoyang/FrozenBiLM

thaolmk54/hcrn-videoqa

vztu/VIDEVAL

wangleihitcs/Papers

yuleiniu/cfvqa

antoyang/just-ask