video-question-answering

There are 48 repositories under video-question-answering topic.

  • OpenGVLab/Ask-Anything

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Language:Python3.1k37240257
  • OpenGVLab/InternVideo

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    Language:Python1.5k2720194
  • jayleicn/ClipBERT

    [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

    Language:Python714105986
  • Vision-CAIR/MiniGPT4-video

    Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

    Language:Python580124263
  • X-PLUG/Youku-mPLUG

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

    Language:Python28963211
  • X-PLUG/mPLUG-2

    mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

    Language:Python22052519
  • apple/ml-slowfast-llava

    SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

    Language:Python19610412
  • salesforce/ALPRO

    Align and Prompt: Video-and-Language Pre-training with Entity Prompts

    Language:Python18771617
  • Yui010206/SeViLA

    [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

    Language:Python18432722
  • antoyang/FrozenBiLM

    [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

    Language:Python15651523
  • doc-doc/NExT-QA

    NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

    Language:Python13922815
  • tsujuifu/pytorch_violet

    A PyTorch implementation of VIOLET

    Language:Python1379176
  • jayleicn/TVQAplus

    [ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

    Language:Python126102324
  • jpthu17/EMCL

    [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

    Language:Python126349
  • just-ask

    antoyang/just-ask

    [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

    Language:Jupyter Notebook11851315
  • jpthu17/HBI

    [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

    Language:Python111495
  • bytedance/Shot2Story

    A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

    Language:Python1066176
  • mlvlab/Flipped-VQA

    Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

    Language:Python745239
  • doc-doc/NExT-GQA

    Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

    Language:Python63181
  • whwu95/FreeVA

    FreeVA: Offline MLLM as Training-Free Video Assistant

    Language:Python54270
  • bcmi/Causal-VidQA

    [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

    Language:Python5210124
  • sail-sg/VGT

    Video Graph Transformer for Video Question Answering (ECCV'22)

    Language:Python464912
  • zchoi/PKOL

    [TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

    Language:Python46210
  • tsujuifu/pytorch_empirical-mvm

    A PyTorch implementation of EmpiricalMVM

    Language:Python39292
  • XLiu443/Tem-adapter

    [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

    Language:Python35252
  • doc-doc/HQGA

    Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

    Language:Python341184
  • mlvlab/MELTR

    MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

    Language:Python32757
  • noagarcia/knowit-rock

    ROCK model for Knowledge-Based VQA in Videos

    Language:Python30815
  • yl3800/IGV

    This repo contains code for Invariant Grounding for Video Question Answering

    Language:Python262103
  • doc-doc/CoVGT

    Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)

    Language:Python192111
  • noagarcia/ROLL-VideoQA

    PyTorch code for ROLL, a knowledge-based video story question answering model.

    Language:Python19334
  • mlvlab/OVQA

    Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

    Language:Python18510
  • zhousheng97/ViTXT-GQA

    ✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)

    Language:Python12201
  • mmazab/LifeQA

    Data and PyTorch code for the LifeQA LREC 2020 paper.

    Language:Python11911
  • declare-lab/Sealing

    [NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

    Language:Python9403
  • chakravarthi589/Video-Question-Answering_Resources

    Video Question Answering | Video QA | VQA

    84