Pinned Repositories
ikodoh.github.io
MELTR
Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
InternVideo2
VL-RLHF
A RLHF Infrastructure for Vision-Language Models