video-qa

There are 7 repositories under video-qa topic.

  • sutdcv/SUTD-TrafficQA

    [CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

    Language:JavaScript49462
  • RenShuhuai-Andy/TESTA

    [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

    Language:Python43303
  • TXH-mercury/COSA

    Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

    Language:Python37242
  • Kyung-Min/Deep-Embedded-Memory-Networks

    https://arxiv.org/abs/1707.00836

    Language:Jupyter Notebook22316
  • ZJULearning/videoqa

    Unifying the Video and Question Attentions for Open-Ended Video Question Answering

    Language:Python21424
  • yqf-oo/videoqa-stan

    Video Question Answering via Hierarchical Spatio-Temporal Attention Networks

    Language:Python8113
  • ksm26/Large-Multimodal-Model-Prompting-with-Gemini

    The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.