video-qa
There are 7 repositories under video-qa topic.
sutdcv/SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
RenShuhuai-Andy/TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
TXH-mercury/COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Kyung-Min/Deep-Embedded-Memory-Networks
https://arxiv.org/abs/1707.00836
ZJULearning/videoqa
Unifying the Video and Question Attentions for Open-Ended Video Question Answering
yqf-oo/videoqa-stan
Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
ksm26/Large-Multimodal-Model-Prompting-with-Gemini
The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.