video-question-answering

There are 48 repositories under video-question-answering topic.

OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.1k 37 240257
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.5k 27 20194
jayleicn/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Language:Python714 10 5986
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Language:Python580 12 4263
X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language:Python289 6 3211
X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Language:Python220 5 2519
apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Language:Python196 10 412
salesforce/ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Language:Python187 7 1617
Yui010206/SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Language:Python184 3 2722
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language:Python156 5 1523
doc-doc/NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Language:Python139 2 2815
tsujuifu/pytorch_violet
A PyTorch implementation of VIOLET
Language:Python137 9 176
jayleicn/TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
Language:Python126 10 2324
jpthu17/EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language:Python126 3 49
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language:Jupyter Notebook118 5 1315
jpthu17/HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Language:Python111 4 95
bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Language:Python106 6 176
mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language:Python74 5 239
doc-doc/NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
Language:Python63 1 81
whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
Language:Python54 2 70
bcmi/Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.
Language:Python52 10 124
sail-sg/VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
Language:Python46 4 912
zchoi/PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Language:Python46 2 10
tsujuifu/pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
Language:Python39 2 92
XLiu443/Tem-adapter
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Language:Python35 2 52
doc-doc/HQGA
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
Language:Python34 1 184
mlvlab/MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Language:Python32 7 57
noagarcia/knowit-rock
ROCK model for Knowledge-Based VQA in Videos
Language:Python30 8 15
yl3800/IGV
This repo contains code for Invariant Grounding for Video Question Answering
Language:Python26 2 103
doc-doc/CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
Language:Python19 2 111
noagarcia/ROLL-VideoQA
PyTorch code for ROLL, a knowledge-based video story question answering model.
Language:Python19 3 34
mlvlab/OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Language:Python18 5 10
zhousheng97/ViTXT-GQA
✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)
Language:Python12 2 01
mmazab/LifeQA
Data and PyTorch code for the LifeQA LREC 2020 paper.
Language:Python11 9 11
declare-lab/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
Language:Python9 4 03
chakravarthi589/Video-Question-Answering_Resources
Video Question Answering | Video QA | VQA
84

video-question-answering

OpenGVLab/Ask-Anything

OpenGVLab/InternVideo

jayleicn/ClipBERT

Vision-CAIR/MiniGPT4-video

X-PLUG/Youku-mPLUG

X-PLUG/mPLUG-2

apple/ml-slowfast-llava

salesforce/ALPRO

Yui010206/SeViLA

antoyang/FrozenBiLM

doc-doc/NExT-QA

tsujuifu/pytorch_violet

jayleicn/TVQAplus

jpthu17/EMCL

antoyang/just-ask

jpthu17/HBI

bytedance/Shot2Story

mlvlab/Flipped-VQA

doc-doc/NExT-GQA

whwu95/FreeVA

bcmi/Causal-VidQA

sail-sg/VGT

zchoi/PKOL

tsujuifu/pytorch_empirical-mvm

XLiu443/Tem-adapter

doc-doc/HQGA

mlvlab/MELTR

noagarcia/knowit-rock

yl3800/IGV

doc-doc/CoVGT

noagarcia/ROLL-VideoQA

mlvlab/OVQA

zhousheng97/ViTXT-GQA

mmazab/LifeQA

declare-lab/Sealing

chakravarthi589/Video-Question-Answering_Resources