ffiioonnaa

Pinned Repositories

Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Language:Python84 6 135
ReadingNotes
0 1 00
volux-gan
Language:Python27 4 65
VTG-LLM
[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Language:Python43 3 91
TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Language:Python263 5 4323
UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Language:Python186 6 5418
R2-Tuning
🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
Language:Python45 6 131
PN-Relighting
Language:Python7 1 11