vqa

There are 260 repositories under vqa topic.

  • facebookresearch/mmf

    A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

    Language:Python5.5k114657936
  • OpenGVLab/InternGPT

    InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

    Language:Python3.2k4450232
  • BDBC-KG-NLP/QA-Survey-CN

    北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。

  • open-compass/VLMEvalKit

    Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

    Language:Python1.5k11254218
  • peteanderson80/bottom-up-attention

    Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

    Language:Jupyter Notebook1.4k26117378
  • roboflow/maestro

    streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL

    Language:Python1.4k2018103
  • NVlabs/prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

    Language:Python1.3k161974
  • Oscar

    microsoft/Oscar

    Oscar and VinVL

    Language:Python1k26202251
  • hila-chefer/Transformer-MM-Explainability

    [ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

    Language:Jupyter Notebook810836107
  • hengyuan-hu/bottom-up-attention-vqa

    An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

    Language:Python7553448181
  • Cadene/vqa.pytorch

    Visual Question Answering in Pytorch

    Language:Python7183348177
  • jayleicn/ClipBERT

    [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

    Language:Python709105986
  • jokieleung/awesome-visual-question-answering

    A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

  • stanfordnlp/mac-network

    Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

    Language:Python4963143119
  • OpenGVLab/Multi-Modality-Arena

    Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

    Language:Python47962836
  • chingyaoc/awesome-vqa

    Visual Q&A reading list

  • vacancy/NSCL-PyTorch-Release

    PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

    Language:Python419192294
  • davidmascharka/tbd-nets

    PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

    Language:Jupyter Notebook348151474
  • MILVLG/openvqa

    A lightweight, scalable, and general framework for visual question answering research

    Language:Python322122964
  • FuxiaoLiu/LRV-Instruction

    [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

    Language:Python263112313
  • abachaa/Existing-Medical-QA-Datasets

    Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems

  • Cyanogenoid/pytorch-vqa

    Strong baseline for visual question answering

    Language:Python23892197
  • OatmealLiu/FineR

    [ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models

    Language:Python2233813
  • X-PLUG/mPLUG-2

    mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

    Language:Python22152519
  • shure-dev/Awesome-LLM-Papers-Comprehensive-Topics

    Awesome LLM Papers and repos on very comprehensive topics.

  • yuzcccc/vqa-mfb

    Language:Python18281241
  • linjieli222/VQA_ReGAT

    Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

    Language:Python18064138
  • antoyang/FrozenBiLM

    [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

    Language:Python15651523
  • yuanze-lin/REVIVE

    [NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

    Language:Python13591410
  • thaolmk54/hcrn-videoqa

    Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

    Language:Python13271926
  • vztu/VIDEVAL

    [IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

    Language:MATLAB12661519
  • wangleihitcs/Papers

    读过的CV方向的一些论文,图像生成文字、弱监督分割等

  • just-ask

    antoyang/just-ask

    [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

    Language:Jupyter Notebook11851315
  • yuleiniu/cfvqa

    [CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

    Language:Python11722114
  • cvlab-tohoku/Dense-CoAttention-Network

    Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

    Language:Python10551216
  • pairlab/SlotFormer

    Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

    Language:Python1015721