visual-question-answering

There are 172 repositories under visual-question-answering topic.

  • salesforce/BLIP

    PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

    Language:Jupyter Notebook4.4k34188586
  • OFA-Sys/OFA

    Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

    Language:Python2.4k21360247
  • peteanderson80/bottom-up-attention

    Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

    Language:Jupyter Notebook1.4k26116378
  • lucidrains/flamingo-pytorch

    Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

    Language:Python1.2k211360
  • YehLi/xmodaler

    X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

    Language:Python1k3562111
  • jnhwkim/ban-vqa

    Bilinear attention networks for visual question answering

    Language:Python5391445101
  • MILVLG/mcan-vqa

    Deep Modular Co-Attention Networks for Visual Question Answering

    Language:Python43363888
  • richard-peng-xia/awesome-multimodal-in-medical-imaging

    A collection of resources on applications of multi-modal learning in medical imaging.

  • davidmascharka/tbd-nets

    PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

    Language:Jupyter Notebook348151474
  • MILVLG/openvqa

    A lightweight, scalable, and general framework for visual question answering research

    Language:Python311122964
  • MMMU-Benchmark/MMMU

    This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

    Language:Python28242518
  • MILVLG/prophet

    Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

    Language:Python26134027
  • Cyanogenoid/pytorch-vqa

    Strong baseline for visual question answering

    Language:Python23992197
  • zjukg/KG-MM-Survey

    Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

  • lupantech/MathVista

    MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

    Language:Jupyter Notebook19152228
  • markdtw/vqa-winner-cvprw-2017

    Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

    Language:Python16511738
  • antoyang/FrozenBiLM

    [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

    Language:Python14641522
  • qiantianwen/NuScenes-QA

    [AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

  • MMStar-Benchmark/MMStar

    This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

    Language:Python122181
  • zhegan27/VILLA

    Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

    Language:Python11981213
  • Yushi-Hu/tifa

    TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

    Language:Python116357
  • just-ask

    antoyang/just-ask

    [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

    Language:Jupyter Notebook11451215
  • anisha2102/docvqa

    Document Visual Question Answering

    Language:Python10841525
  • HanXinzi-AI/awesome-computer-vision-resources

    a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

  • sdc17/UPop

    [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

    Language:Python905147
  • mesnico/RelationNetworks-CLEVR

    A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

    Language:Python857926
  • Shivanshu-Gupta/Visual-Question-Answering

    CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering

    Language:Python715419
  • violetteshev/bottom-up-features

    Bottom-up features extractor implemented in PyTorch.

    Language:Python7131819
  • China-UK-ZSL/ZS-F-VQA

    [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

    Language:Python6131215
  • mlvlab/Flipped-VQA

    Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

    Language:Python616197
  • rentainhe/TRAR-VQA

    [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

    Language:Python613418
  • DenisDsh/VizWiz-VQA-PyTorch

    PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

    Language:Jupyter Notebook604219
  • KVQA

    SKTBrain/KVQA

    Korean Visual Question Answering

  • badripatro/PQG

    Code for paper title "Learning Semantic Sentence Embeddings using Pair-wise Discriminator" COLING-2018

    Language:Jupyter Notebook54569
  • allenai/aokvqa

    Official repository for the A-OKVQA dataset

    Language:Python514145
  • ai-forever/fusion_brain_aij2021

    Creating multimodal multitask models

    Language:Jupyter Notebook495116